fork + exec; what are the possible resource leaks? [Unix Programming]

Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18

From: Joshua Maurice on 16 Apr 2010 17:56

I'm somewhat new to POSIX. It seems that the only way to create a new
process is fork. However, fork inherits all file descriptors. exec
closes only the file descriptors marked as "close on exec". I
generally spawn a separate process because of the isolation this
affords. If a process misbehaves, like if it has a resource leak, I
know that when that process dies the resource leak will generally go
away. However, if a process misbehaves, like not settings "close on
exec" when opening the file descriptor (an option only available in
recent Linux kernels), it's possible that I will leak a file
descriptor to that child and all direct and indirect grandchildren.

So, how does one generally deal with this? Close all file descriptors
from 3 to the max possible file descriptor? "proc/self/fd" is a good
alternative, but not portable in fact and not POSIX aka not portable
in theory. What do other people do?

Also, what other resources should I be concerned about when doing a
fork + exec? What other possible resources can "leak" into the child
and all grandchildren?

PS: There really should be a spawn process ala win32. This should not
replace fork, but there should be an alternative to fork to bring up a
clean process. That or there should be sane interfaces to accomplish
the same: to guarantee that I don't have any random open resources
which I will continue to leak and leak into my direct and indirect
grandchildren. And no, posix_spawn is not that. It is defined to have
the same semantics as fork + exec and all of the baggage which comes
along with it. I'm just trying to program defensively, and POSIX is
making it hard for me to do that.

From: Chris Friesen on 16 Apr 2010 19:09

On 04/16/2010 03:56 PM, Joshua Maurice wrote:

> So, how does one generally deal with this? Close all file descriptors
> from 3 to the max possible file descriptor?

Yep.

If you want to be really anal, close all possible file descriptors and
then reopen 0/1/2 as desired.

There's good information on this at:

http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor

> Also, what other resources should I be concerned about when doing a
> fork + exec? What other possible resources can "leak" into the child
> and all grandchildren?

This is all covered in the man pages for fork() and exec(). Generally
open files of various kinds are what you need to worry about. File
locks are not preserved over fork() but are over exec().

> PS: There really should be a spawn process ala win32. This should not
> replace fork, but there should be an alternative to fork to bring up a
> clean process.

Arguably, yes.

Chris

From: Scott Lurndal on 16 Apr 2010 19:19

Joshua Maurice <joshuamaurice(a)gmail.com> writes:

> However, if a process misbehaves, like not settings "close on
>exec" when opening the file descriptor (an option only available in
>recent Linux kernels)

The "Close on Exec" option has been part of _every_ unix and linux kernel
since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
in System V it was made an fcntl(2) flag.

>, it's possible that I will leak a file
>descriptor to that child and all direct and indirect grandchildren.

Most applications that use fork/exec to spawn processes will stick
a loop between the fork and exec to close all file descriptors
except 0, 1 and 2 (and will often redirect those, perhaps to pipes,
as well)

Given that a process opened by the shell will typically (but not always)
have file descriptors 0, 1 and 2 in use and all others closed, the only
file descriptors you don't have control over are those used by
libraries. The above loop will accomodate applications which use libraries
that open files and forget to set CLOEXEC.

>
>So, how does one generally deal with this? Close all file descriptors
>from 3 to the max possible file descriptor?
Yes, this is the typical solution for applications that don't
control all the files that may be opened.

When I was on the X/Open base working group in the 90's, I lobbied
for a 'closeall' function that would close all file descriptors
above the provided fd, but it was never accepted (primarily since at
the time, X/Open didn't invent, but rather attempted to standardize
existing practice).

>Also, what other resources should I be concerned about when doing a
>fork + exec? What other possible resources can "leak" into the child
>and all grandchildren?

man exec.

>
>PS: There really should be a spawn process ala win32. This should not

man posix_spawn

scott

From: Joshua Maurice on 16 Apr 2010 20:21

On Apr 16, 4:19 pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:
> Joshua Maurice <joshuamaur...(a)gmail.com> writes:
> > However, if a process misbehaves, like not settings "close on
> >exec" when opening the file descriptor (an option only available in
> >recent Linux kernels)
>
> The "Close on Exec" option has been part of _every_ unix and linux kernel
> since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
> in System V it was made an fcntl(2) flag.

Race condition. Up until a recent Linux kernel version, you could not
set close on exec in open; you could only set it with fcntl. In a
multithreaded program, there is a small window between open and fcntl
in which fork could be called, resulting in that file descriptor being
leaked. This lack of possible correctness was fixed when you could
specify O_CLOEXEC to open. See
http://udrepper.livejournal.com/20407.html
for full details.

> >, it's possible that I will leak a file
> >descriptor to that child and all direct and indirect grandchildren.
>
> Most applications that use fork/exec to spawn processes will stick
> a loop between the fork and exec to close all file descriptors
> except 0, 1 and 2 (and will often redirect those, perhaps to pipes,
> as well)
>
> Given that a process opened by the shell will typically (but not always)
> have file descriptors 0, 1 and 2 in use and all others closed, the only
> file descriptors you don't have control over are those used by
> libraries. The above loop will accomodate applications which use libraries
> that open files and forget to set CLOEXEC.

> >So, how does one generally deal with this? Close all file descriptors
> >from 3 to the max possible file descriptor?
>
> Yes, this is the typical solution for applications that don't
> control all the files that may be opened.
>
> When I was on the X/Open base working group in the 90's, I lobbied
> for a 'closeall' function that would close all file descriptors
> above the provided fd, but it was never accepted (primarily since at
> the time, X/Open didn't invent, but rather attempted to standardize
> existing practice).

Yes, but the potential max can be quite large, and that's just wasted
time. I suppose it's not that bad if you're not spawning that many
processes for a suitably small value. I just hope I don't run into a
system where the max file desc is a 64 bit int max.

Then there's still the problem that I want to program defensively, and
not have to rely upon a library guarantee that it creates all file
handles "close on exec". Then when I'm doing automated testing of my
product, preferably I want to isolate these leaks for software which
is under development.

> >Also, what other resources should I be concerned about when doing a
> >fork + exec? What other possible resources can "leak" into the child
> >and all grandchildren?
>
> man exec.

Thanks for the terseness. [Sarcasm]. I was looking for more pearls of
wisdom from those more experienced, like common gotchas.

> >PS: There really should be a spawn process ala win32. This should not
>
> man posix_spawn

Did you even read my full post? I specifically mentioned that
posix_spawn is not that in the next sentence of my previous post, the
one to which you're replying. It carries all of the same semantics of
fork + exec, which includes possibly leaking over process boundaries.
That extra baggage is exactly what I don't want to deal with most of
the time. Most of the time, I just want to be able to create a new
process without worrying about leaked file handles, which signal masks
get inherited, etc.

From: William Ahern on 16 Apr 2010 20:30

Chris Friesen <cbf123(a)mail.usask.ca> wrote:
> On 04/16/2010 03:56 PM, Joshua Maurice wrote:

> > So, how does one generally deal with this? Close all file descriptors
> > from 3 to the max possible file descriptor?

> Yep.

> If you want to be really anal, close all possible file descriptors and
> then reopen 0/1/2 as desired.

> There's good information on this at:

> http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor

A good post, but it's missing the most portable option, getdtablesize(2).

It's often considered "non-portable", and yet it's available in Linux, *BSD,
AIX, Solaris, and HP/UX (at least according to their online documentation).

Some of the man pages say that it is equivalent to both the RLIMIT_NOFILE
soft-limit, and the descriptor table size. I'm unsure whether setrlimit will
successfully lower the soft-limit below the highest numbered descriptor
already allocated. In any event, getdtablesize() is the best fall-back for
when a local API (such as those mentioned in the URI above) isn't available.

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18