From: Phred Phungus on
Hello newsgroups,

As an exercise for an autodidact, I've been trying to write a reasonable
read directory utility in more or less standard C.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <limits.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <unistd.h>

#define PATH_SIZE 300


int
main (void)
{
DIR *dir = NULL;
struct dirent entry;
struct dirent *entryPtr = NULL;
int retval = 0;
unsigned count = 0;
char pathName[PATH_SIZE + 1];

char theDir[] = "/etc";

/* Open the given directory, if we can. */
dir = opendir (theDir);
if (dir == NULL)
{
printf ("Error opening %s: %s", theDir, strerror (errno));
return 0;
}


retval = readdir_r (dir, &entry, &entryPtr);
while (entryPtr != NULL)
{
struct stat entryInfo;

if ((strncmp (entry.d_name, ".", PATH_SIZE) == 0) ||
(strncmp (entry.d_name, "..", PATH_SIZE) == 0))
{
/* Short-circuit the . and .. entries. */
retval = readdir_r (dir, &entry, &entryPtr);
continue;
}

(void) strncpy (pathName, theDir, PATH_SIZE);
(void) strncat (pathName, "/", PATH_SIZE);
(void) strncat (pathName, entry.d_name, PATH_SIZE);

if (lstat (pathName, &entryInfo) == 0)
{
/* stat() succeeded, let's party */
count++;

if (S_ISDIR (entryInfo.st_mode))
{ /* directory */
printf ("processing %s/\n", pathName);

}
else if (S_ISREG (entryInfo.st_mode))
{ /* regular file */
printf ("\t%s has %lld bytes\n",
pathName, (long long) entryInfo.st_size);
}
else if (S_ISLNK (entryInfo.st_mode))
{ /* symbolic link */
char targetName[PATH_SIZE + 1];
if (readlink (pathName, targetName, PATH_SIZE) != -1)
{
printf ("\t%s -> %s\n", pathName, targetName);
}
else
{
printf ("\t%s -> (invalid symbolic link!)\n", pathName);
}
}
}
else
{
printf ("Error statting %s: %s\n", pathName, strerror (errno));
}

retval = readdir_r (dir, &entry, &entryPtr);
}


(void) closedir (dir);

return 0;
}

// gcc -D_GNU_SOURCE -Wall -Wextra e3.c -o out

This compiles and behaves. I have any number of questions.

1) The program behaves identically whether theDir is /etc/ or /etc . I
s this because two slashes evaluate to one?

2) My much bigger question is one that I missed a response from Geoff,
and I see Lew posting this now:

Geoff gave you the answer: use the getconf utility program.

getconf PATH_MAX $GivenDirectory

This question goes to PATH_MAX.

3) How do I obtain permission to look at the arbitrary directory? I've
found that chmod 700 isn't enough for me ask my desktop for permission
to obtain permission to see what's under the hood.

Thanks for your comment, and cheers,
--
fred
From: Ersek, Laszlo on
In article <7u6irfFv4uU1(a)mid.individual.net>, Phred Phungus <Phred(a)example.invalid> writes:

> 1) The program behaves identically whether theDir is /etc/ or /etc . I
> s this because two slashes evaluate to one?

http://www.opengroup.org/onlinepubs/007908775/xbd/glossary.html#tag_004_000_196
http://www.opengroup.org/onlinepubs/007908775/xbd/glossary.html#tag_004_000_198

http://www.opengroup.org/onlinepubs/007908775/xcu/basename.html
http://www.opengroup.org/onlinepubs/007908775/xcu/dirname.html

http://www.opengroup.org/onlinepubs/007908775/xsh/basename.html
http://www.opengroup.org/onlinepubs/007908775/xsh/dirname.html


> 2) My much bigger question is one that I missed a response from Geoff,
> and I see Lew posting this now:
>
> Geoff gave you the answer: use the getconf utility program.
>
> getconf PATH_MAX $GivenDirectory
>
> This question goes to PATH_MAX.

http://www.opengroup.org/onlinepubs/007908775/xsh/limits.h.html#tag_000_007_349_002
http://www.opengroup.org/onlinepubs/007908775/xcu/getconf.html


> 3) How do I obtain permission to look at the arbitrary directory? I've
> found that chmod 700 isn't enough for me ask my desktop for permission
> to obtain permission to see what's under the hood.

To list the contents of directory "/a/b/c/d/e", you need effective +x
for all of /, "a", "b", "c", "d", and +r for "e". If you want to stat
files under "e" additionally, then you need +rx for "e".

For pathnames relative to the current working directory, replace the
initial "/" with "./" in the previous paragraph.

Cheers,
lacos
From: OldSchool on

> 3) How do I obtain permission to look at the arbitrary directory?  I've
> found that chmod 700 isn't enough for me ask my desktop for permission
> to obtain permission to see what's under the hood.
>

> --
> fred

YOU don't. Permissions are granted to three classes of user, the
owner, the group and the world

depending on the existing permissions, and who you are, you may not be
able to look, or modify, the permissions. User "root", or UID 0, is
exempt from this restriction which leads many newbies to always login
as root (if they can).

again, "man chmod" will give you the information required to decode
the octets noted and figure out who can do what.
From: Phred Phungus on
Ersek, Laszlo wrote:
> In article <7u6irfFv4uU1(a)mid.individual.net>, Phred Phungus <Phred(a)example.invalid> writes:
>
>> 1) The program behaves identically whether theDir is /etc/ or /etc . I
>> s this because two slashes evaluate to one?
>
> http://www.opengroup.org/onlinepubs/007908775/xbd/glossary.html#tag_004_000_196
> http://www.opengroup.org/onlinepubs/007908775/xbd/glossary.html#tag_004_000_198
>
> http://www.opengroup.org/onlinepubs/007908775/xcu/basename.html
> http://www.opengroup.org/onlinepubs/007908775/xcu/dirname.html
>
> http://www.opengroup.org/onlinepubs/007908775/xsh/basename.html
> http://www.opengroup.org/onlinepubs/007908775/xsh/dirname.html
>
>
>> 2) My much bigger question is one that I missed a response from Geoff,
>> and I see Lew posting this now:
>>
>> Geoff gave you the answer: use the getconf utility program.
>>
>> getconf PATH_MAX $GivenDirectory
>>
>> This question goes to PATH_MAX.
>
> http://www.opengroup.org/onlinepubs/007908775/xsh/limits.h.html#tag_000_007_349_002
> http://www.opengroup.org/onlinepubs/007908775/xcu/getconf.html
>
>
>> 3) How do I obtain permission to look at the arbitrary directory? I've
>> found that chmod 700 isn't enough for me ask my desktop for permission
>> to obtain permission to see what's under the hood.
>
> To list the contents of directory "/a/b/c/d/e", you need effective +x
> for all of /, "a", "b", "c", "d", and +r for "e". If you want to stat
> files under "e" additionally, then you need +rx for "e".
>
> For pathnames relative to the current working directory, replace the
> initial "/" with "./" in the previous paragraph.
>
> Cheers,
> lacos

Thx, lacos, those links have gotten me over this hurdle (I think):

$ gcc -D_GNU_SOURCE -Wall -Wextra e4.c -o out
$ ./out
path_max is 4096
name_max is 255
buf_size is 4353
$ cat e4.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <limits.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <unistd.h>

int
main (void)
{
DIR *dir = NULL;
long path_max;
long name_max;
long buf_size;
char theDir[] = "/etc";

/* Open the given directory, if we can. */
dir = opendir (theDir);
if (dir == NULL)
{
printf ("Error opening %s: %s", theDir, strerror (errno));
return 0;
}
// pathconf data
path_max = pathconf(theDir,_PC_PATH_MAX);
name_max = pathconf(theDir,_PC_NAME_MAX);
printf ("path_max is %lu\n", path_max);
printf ("name_max is %lu\n", name_max);
//name_max needs a byte for the null character and we need a slash
buf_size = path_max + name_max + 2;
printf ("buf_size is %lu\n", buf_size);

(void) closedir (dir);

return 0;
}

// gcc -D_GNU_SOURCE -Wall -Wextra e4.c -o out
$

I feel really good about this, as it represents a rather crooked path in
figuring out these data. My question is whether buf_size is of
sufficient size to handle any entry in the given directory.

Cheers,
--
fred


From: Alan Curry on
In article <7uh7alFf25U1(a)mid.individual.net>,
Phred Phungus <Phred(a)example.invalid> wrote:
| // pathconf data
| path_max = pathconf(theDir,_PC_PATH_MAX);
| name_max = pathconf(theDir,_PC_NAME_MAX);

I'm so tired of watching this slow-motion train wreck I'm going to try to
speed it up.

You've seemingly decided to dedicate the rest of your life to trying to
figure out the right way to use PATH_MAX and/or NAME_MAX and/or some "modern"
replacement for them.

Give up. There is no right way.

If you build a pathname by starting with a directory name that you have
successfully opendir'ed, appending a slash and a filename that you got from
readdir'ing it, there's a chance it'll be too long. Perfectly valid in terms
of syntax, and theoretically referring to an existing file, but rejected by
the kernel because it's longer than PATH_MAX.

This can only happen when a directory tree is very deep with normal-sized
names, or moderately deep with long names.

In spite of the fact that "directory name + slash + filename" is not
guaranteed to generate a usable pathname, it has been used by almost every
program that has ever done directory processing. There are modern approaches
that avoid this problem, but you won't be heavily criticized for not using
them, especially if you aren't recursing through the directory tree.

So if you're going to build a "dir + slash + file" string, how big should the
buffer be? As big as it needs to be to hold the pieces! Not PATH_MAX, or
anything resembling it. It should be strlen(dir) plus strlen(file) plus 1 for
the slash and 1 for the NUL.

Yes, that means you can't allocate it before you start reading the directory.
Boo hoo.

If you use a fixed-size buffer to hold a pathname, you're imposing a limit on
the user of your program. Your buffer size may be based on PATH_MAX, which
in the best case scenario means that you may be imposing a limit that would
have been imposed by the kernel anyway.

But in another scenario, PATH_MAX or _PC_PATH_MAX will be lying, and you'll
be unnecessarily rejecting a pathname that would have been fine if you'd just
gone ahead and used it.

Why would they lie, you may ask? Because the limitation itself - that the
kernel can reject properly constructed pathnames because it doesn't like to
examine long strings - is moderately embarrassing, and as soon as someone in
kernel land gets irked by it, the next kernel release could have the limit
removed. That's what happened to ARG_MAX, which getconf still says is 131072
even though the true value has been "practically infinite" for quite some
time.

When you look at PATH_MAX, you're asking "how long of a pathname can I give
you, before you get grumpy and reject it?" And you're not even asking the
right entity. You're getting an answer that was built into glibc. Only the
kernel knows for sure, and it doesn't provide a way to ask the question.

If you want to know if some long pathname can be opened, you shouldn't ask
what the limit is. Just try to open the damn thing, and if you get an
ENAMETOOLONG, then you'll know for sure it was too long.

With a fixed-size buffer, picking the "right" size doesn't save you from the
responsibility to protect against buffer overflows. You'd still have to check
that things fit, and report an error when they don't.

A dynamically allocated buffer (VLA or malloc, take your pick) allows the
kernel to detect the "name too long" error, so you can report it with
perror() after a failed open, just like any other kind of open error. It
simplifies the code.

pathconf() is the worst of both worlds. You're still getting the compiled-in
glibc value that may be lagging years behind the actual running kernel's
limits (or lack thereof), so you have to do your own checking for the "too
long" error, but you didn't get the sole benefit of char buf[PATH_MAX+1]
which is its brevity.

--
Alan Curry