From: Phred Phungus on
Alan Curry wrote:
> In article <7uh7alFf25U1(a)mid.individual.net>,
> Phred Phungus <Phred(a)example.invalid> wrote:
> | // pathconf data
> | path_max = pathconf(theDir,_PC_PATH_MAX);
> | name_max = pathconf(theDir,_PC_NAME_MAX);
>
> I'm so tired of watching this slow-motion train wreck I'm going to try to
> speed it up.
>
> You've seemingly decided to dedicate the rest of your life to trying to
> figure out the right way to use PATH_MAX and/or NAME_MAX and/or some "modern"
> replacement for them.
>
> Give up. There is no right way.
>
> If you build a pathname by starting with a directory name that you have
> successfully opendir'ed, appending a slash and a filename that you got from
> readdir'ing it, there's a chance it'll be too long. Perfectly valid in terms
> of syntax, and theoretically referring to an existing file, but rejected by
> the kernel because it's longer than PATH_MAX.
>
> This can only happen when a directory tree is very deep with normal-sized
> names, or moderately deep with long names.
>
> In spite of the fact that "directory name + slash + filename" is not
> guaranteed to generate a usable pathname, it has been used by almost every
> program that has ever done directory processing. There are modern approaches
> that avoid this problem, but you won't be heavily criticized for not using
> them, especially if you aren't recursing through the directory tree.
>
> So if you're going to build a "dir + slash + file" string, how big should the
> buffer be? As big as it needs to be to hold the pieces! Not PATH_MAX, or
> anything resembling it. It should be strlen(dir) plus strlen(file) plus 1 for
> the slash and 1 for the NUL.
>
> Yes, that means you can't allocate it before you start reading the directory.
> Boo hoo.
>
> If you use a fixed-size buffer to hold a pathname, you're imposing a limit on
> the user of your program. Your buffer size may be based on PATH_MAX, which
> in the best case scenario means that you may be imposing a limit that would
> have been imposed by the kernel anyway.
>
> But in another scenario, PATH_MAX or _PC_PATH_MAX will be lying, and you'll
> be unnecessarily rejecting a pathname that would have been fine if you'd just
> gone ahead and used it.
>
> Why would they lie, you may ask? Because the limitation itself - that the
> kernel can reject properly constructed pathnames because it doesn't like to
> examine long strings - is moderately embarrassing, and as soon as someone in
> kernel land gets irked by it, the next kernel release could have the limit
> removed. That's what happened to ARG_MAX, which getconf still says is 131072
> even though the true value has been "practically infinite" for quite some
> time.
>
> When you look at PATH_MAX, you're asking "how long of a pathname can I give
> you, before you get grumpy and reject it?" And you're not even asking the
> right entity. You're getting an answer that was built into glibc. Only the
> kernel knows for sure, and it doesn't provide a way to ask the question.
>
> If you want to know if some long pathname can be opened, you shouldn't ask
> what the limit is. Just try to open the damn thing, and if you get an
> ENAMETOOLONG, then you'll know for sure it was too long.
>
> With a fixed-size buffer, picking the "right" size doesn't save you from the
> responsibility to protect against buffer overflows. You'd still have to check
> that things fit, and report an error when they don't.
>
> A dynamically allocated buffer (VLA or malloc, take your pick) allows the
> kernel to detect the "name too long" error, so you can report it with
> perror() after a failed open, just like any other kind of open error. It
> simplifies the code.
>
> pathconf() is the worst of both worlds. You're still getting the compiled-in
> glibc value that may be lagging years behind the actual running kernel's
> limits (or lack thereof), so you have to do your own checking for the "too
> long" error, but you didn't get the sole benefit of char buf[PATH_MAX+1]
> which is its brevity.
>


I think I understand you.
--
fred
From: Phred Phungus on
Alan Curry wrote:
> In article <7uh7alFf25U1(a)mid.individual.net>,
> Phred Phungus <Phred(a)example.invalid> wrote:
> | // pathconf data
> | path_max = pathconf(theDir,_PC_PATH_MAX);
> | name_max = pathconf(theDir,_PC_NAME_MAX);
>
> I'm so tired of watching this slow-motion train wreck I'm going to try to
> speed it up.
>
> You've seemingly decided to dedicate the rest of your life to trying to
> figure out the right way to use PATH_MAX and/or NAME_MAX and/or some "modern"
> replacement for them.
>
> Give up. There is no right way.

There's worse things than slow-motion train wrecks.

I doubt that you can sense my life's dedications other than ones that
apply to the rather parochial objectives that I laid out in the original
post.

If there's no right way, is there also no *better* way?
>
> If you build a pathname by starting with a directory name that you have
> successfully opendir'ed, appending a slash and a filename that you got from
> readdir'ing it, there's a chance it'll be too long. Perfectly valid in terms
> of syntax, and theoretically referring to an existing file, but rejected by
> the kernel because it's longer than PATH_MAX.
>
> This can only happen when a directory tree is very deep with normal-sized
> names, or moderately deep with long names.
>
> In spite of the fact that "directory name + slash + filename" is not
> guaranteed to generate a usable pathname, it has been used by almost every
> program that has ever done directory processing. There are modern approaches
> that avoid this problem, but you won't be heavily criticized for not using
> them, especially if you aren't recursing through the directory tree.

Well this is all pretty revealing to me. I hardly know what to say
about it other than that it seems to be an indictment of an OS written in C.
>
> So if you're going to build a "dir + slash + file" string, how big should the
> buffer be? As big as it needs to be to hold the pieces! Not PATH_MAX, or
> anything resembling it. It should be strlen(dir) plus strlen(file) plus 1 for
> the slash and 1 for the NUL.

Is this not the better way? You malloc the above.
>
> Yes, that means you can't allocate it before you start reading the directory.
> Boo hoo.
>
> If you use a fixed-size buffer to hold a pathname, you're imposing a limit on
> the user of your program. Your buffer size may be based on PATH_MAX, which
> in the best case scenario means that you may be imposing a limit that would
> have been imposed by the kernel anyway.
>
> But in another scenario, PATH_MAX or _PC_PATH_MAX will be lying, and you'll
> be unnecessarily rejecting a pathname that would have been fine if you'd just
> gone ahead and used it.
>
> Why would they lie, you may ask? Because the limitation itself - that the
> kernel can reject properly constructed pathnames because it doesn't like to
> examine long strings - is moderately embarrassing, and as soon as someone in
> kernel land gets irked by it, the next kernel release could have the limit
> removed. That's what happened to ARG_MAX, which getconf still says is 131072
> even though the true value has been "practically infinite" for quite some
> time.
>
> When you look at PATH_MAX, you're asking "how long of a pathname can I give
> you, before you get grumpy and reject it?" And you're not even asking the
> right entity. You're getting an answer that was built into glibc. Only the
> kernel knows for sure, and it doesn't provide a way to ask the question.
>
> If you want to know if some long pathname can be opened, you shouldn't ask
> what the limit is. Just try to open the damn thing, and if you get an
> ENAMETOOLONG, then you'll know for sure it was too long.
>
> With a fixed-size buffer, picking the "right" size doesn't save you from the
> responsibility to protect against buffer overflows. You'd still have to check
> that things fit, and report an error when they don't.
>
> A dynamically allocated buffer (VLA or malloc, take your pick) allows the
> kernel to detect the "name too long" error, so you can report it with
> perror() after a failed open, just like any other kind of open error. It
> simplifies the code.
>
> pathconf() is the worst of both worlds. You're still getting the compiled-in
> glibc value that may be lagging years behind the actual running kernel's
> limits (or lack thereof), so you have to do your own checking for the "too
> long" error, but you didn't get the sole benefit of char buf[PATH_MAX+1]
> which is its brevity.
>

Thanks for your response, Alan. I have one more question.


/* `sysconf', `pathconf', and `confstr' NAME values. Generic version.
Copyright (C) 1993,1995-1998,2000,2001,2003,2004,2007
Free Software Foundation, Inc.
This file is part of the GNU C Library.

The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */

#ifndef _UNISTD_H
# error "Never use <bits/confname.h> directly; include <unistd.h> instead."
#endif

/* Values for the NAME argument to `pathconf' and `fpathconf'. */
enum
{
_PC_LINK_MAX,
#define _PC_LINK_MAX _PC_LINK_MAX
_PC_MAX_CANON,
#define _PC_MAX_CANON _PC_MAX_CANON
_PC_MAX_INPUT,
#define _PC_MAX_INPUT _PC_MAX_INPUT
_PC_NAME_MAX,
#define _PC_NAME_MAX _PC_NAME_MAX
_PC_PATH_MAX,
#define _PC_PATH_MAX _PC_PATH_MAX
_PC_PIPE_BUF,
#define _PC_PIPE_BUF _PC_PIPE_BUF
_PC_CHOWN_RESTRICTED,
#define _PC_CHOWN_RESTRICTED _PC_CHOWN_RESTRICTED
_PC_NO_TRUNC,
#define _PC_NO_TRUNC _PC_NO_TRUNC
_PC_VDISABLE,

Do I understand you correctly that this stuff is sitting in a can as
opposed to being read on booting?
--
fred
From: Alan Curry on
In article <7upg57Fm4kU1(a)mid.individual.net>,
Phred Phungus <Phred(a)example.invalid> wrote:
|Alan Curry wrote:
|> You've seemingly decided to dedicate the rest of your life to trying to
|> figure out the right way to use PATH_MAX and/or NAME_MAX and/or some "modern"
|> replacement for them.
|>
|> Give up. There is no right way.
|
|There's worse things than slow-motion train wrecks.
|
|I doubt that you can sense my life's dedications other than ones that
|apply to the rather parochial objectives that I laid out in the original
|post.
|
|If there's no right way, is there also no *better* way?

The "better way" to open a file using a name relative to a directory is
openat(). It's pretty new. UNIX got along for quite a long time without it,
using "dirname + slash + filename" strings, in spite of the inherent flaw
in that method (i.e. you can slam into PATH_MAX and die horribly).

There's also an "in between way", using fchdir() to temporarily chdir into
the directory, open the filename, then fchdir() back to the old directory
(which you can get a handle to with opendir(".") if you didn't already have
one). This avoids any possible PATH_MAX problem, but

And of course the "best way" when recursing is probably to use a library
routine specifically designed for recursive directory processing, like
fts() or nftw().

|>
|> In spite of the fact that "directory name + slash + filename" is not
|> guaranteed to generate a usable pathname, it has been used by almost every
|> program that has ever done directory processing. There are modern approaches
|> that avoid this problem, but you won't be heavily criticized for not using
|> them, especially if you aren't recursing through the directory tree.
|
|Well this is all pretty revealing to me. I hardly know what to say
|about it other than that it seems to be an indictment of an OS written in C.

I'm not sure being written in C is relevant. In whatever language, the choice
would still be between "copy entire string into kernel memory, then break it
up into components and process them" and "copy the components into kernel
memory one at a time, processing as you go".

|>
|> So if you're going to build a "dir + slash + file" string, how big should the
|> buffer be? As big as it needs to be to hold the pieces! Not PATH_MAX, or
|> anything resembling it. It should be strlen(dir) plus strlen(file) plus 1 for
|> the slash and 1 for the NUL.
|
|Is this not the better way? You malloc the above.

Right. You were going to end up malloc'ing (or VLA allocating) the result of
your pathconf() calls though, weren't you?

|Thanks for your response, Alan. I have one more question.
|
snipped copy of <bits/confname.h>
|
|Do I understand you correctly that this stuff is sitting in a can as
|opposed to being read on booting?

I don't know what you are getting at. Header files are read when compiling,
not related to system boot at all. You can remove all your header files and
booting will still work. You'd have to put them back before you can compile
anything though.

--
Alan Curry
From: Phred Phungus on
Alan Curry wrote:
> In article <7upg57Fm4kU1(a)mid.individual.net>,
> Phred Phungus <Phred(a)example.invalid> wrote:
> |Alan Curry wrote:
> |> You've seemingly decided to dedicate the rest of your life to trying to
> |> figure out the right way to use PATH_MAX and/or NAME_MAX and/or some "modern"
> |> replacement for them.
> |>
> |> Give up. There is no right way.
> |
> |There's worse things than slow-motion train wrecks.
> |
> |I doubt that you can sense my life's dedications other than ones that
> |apply to the rather parochial objectives that I laid out in the original
> |post.
> |
> |If there's no right way, is there also no *better* way?
>
> The "better way" to open a file using a name relative to a directory is
> openat(). It's pretty new. UNIX got along for quite a long time without it,
> using "dirname + slash + filename" strings, in spite of the inherent flaw
> in that method (i.e. you can slam into PATH_MAX and die horribly).
>
> There's also an "in between way", using fchdir() to temporarily chdir into
> the directory, open the filename, then fchdir() back to the old directory
> (which you can get a handle to with opendir(".") if you didn't already have
> one). This avoids any possible PATH_MAX problem, but
>
> And of course the "best way" when recursing is probably to use a library
> routine specifically designed for recursive directory processing, like
> fts() or nftw().

I've looked at these in my flailing about with directories of the last 2
months.

http://www.opengroup.org/onlinepubs/000095399/idx/if.html

My unix reference link above isn't new enough to have openat(2) and fts(2).

Is there an authoritative online function reference for contemporary unix?
>
> |>
> |> In spite of the fact that "directory name + slash + filename" is not
> |> guaranteed to generate a usable pathname, it has been used by almost every
> |> program that has ever done directory processing. There are modern approaches
> |> that avoid this problem, but you won't be heavily criticized for not using
> |> them, especially if you aren't recursing through the directory tree.
> |
> |Well this is all pretty revealing to me. I hardly know what to say
> |about it other than that it seems to be an indictment of an OS written in C.
>
> I'm not sure being written in C is relevant. In whatever language, the choice
> would still be between "copy entire string into kernel memory, then break it
> up into components and process them" and "copy the components into kernel
> memory one at a time, processing as you go".

Ok.
>
> |>
> |> So if you're going to build a "dir + slash + file" string, how big should the
> |> buffer be? As big as it needs to be to hold the pieces! Not PATH_MAX, or
> |> anything resembling it. It should be strlen(dir) plus strlen(file) plus 1 for
> |> the slash and 1 for the NUL.
> |
> |Is this not the better way? You malloc the above.
>
> Right. You were going to end up malloc'ing (or VLA allocating) the result of
> your pathconf() calls though, weren't you?

I wanted to know how big they could be. You are correct that I was
stuck on how to do the malloc'ing and checking the result.

I'll be happy to just skip it and move on to something else.
>
> |Thanks for your response, Alan. I have one more question.
> |
> snipped copy of <bits/confname.h>
> |
> |Do I understand you correctly that this stuff is sitting in a can as
> |opposed to being read on booting?
>
> I don't know what you are getting at. Header files are read when compiling,
> not related to system boot at all. You can remove all your header files and
> booting will still work. You'd have to put them back before you can compile
> anything though.
>

Can one say that _PC_PATH_MAX is part of the kernel?
--
fred
From: Alan Curry on
In article <7uui0lFm87U1(a)mid.individual.net>,
Phred Phungus <Phred(a)example.invalid> wrote:
|Alan Curry wrote:
|
|http://www.opengroup.org/onlinepubs/000095399/idx/if.html
|
|My unix reference link above isn't new enough to have openat(2) and fts(2).
|
|Is there an authoritative online function reference for contemporary unix?

A newer version of the same spec is here:

http://www.opengroup.org/onlinepubs/9699919799/mindex.html

I have no idea why they hide them under inexplicable names like "000095399"
and "9699919799" instead of some naming scheme that would enable us to find
the newer version more easily.

Aside from that, you should have some local man pages.

|> |Do I understand you correctly that this stuff is sitting in a can as
|> |opposed to being read on booting?
|>
|> I don't know what you are getting at. Header files are read when compiling,
|> not related to system boot at all. You can remove all your header files and
|> booting will still work. You'd have to put them back before you can compile
|> anything though.
|>
|
|Can one say that _PC_PATH_MAX is part of the kernel?

Not really... it is used to dynamically query the maximum allowable pathname
length, which is a kernel limitation, but _PC_PATH_MAX and the other
constants you can pass to pathconf() are just part of the libc interface. The
kernel never sees them.

_PC_PATH_MAX is a numeric constant, used by libc and the callers of libc
(e.g. you) as part of the libc API.

--
Alan Curry
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5
Prev: Forging IPv6 addresses?
Next: using select as a sleep call