From: Jeff Layton on
On Wed, 16 Jun 2010 14:59:13 -0400
Valerie Aurora <vaurora(a)redhat.com> wrote:

> Who needs d_ino anyway? I am running a kernel with this patch -
> Gnome, a browser, IRC, kernel compile, etc. and everything works.
>
> -VAL
>
> commit 184f3919d0071f3bfa40010aa6919ea89999d79b
> Author: Valerie Aurora <vaurora(a)redhat.com>
> Date: Wed Jun 16 11:05:06 2010 -0700
>
> VFS: Always return 0 for d_ino
>
> Use of d_ino without the corresponding st_dev is always buggy in the
> presence of submounts, bind mounts, and union mounts. E.g., the d_ino
> of a mountpoint will be the inode number of the directory under the
> mountpoint, not the mounted directory. Correct code must call stat(),
> which returns the correct device ID and inode in st_dev and st_ino.
> Since no one should be using d_ino anyway, always return 0 to detect
> bugs.
>
> diff --git a/fs/readdir.c b/fs/readdir.c
> index dd3eae1..38ea772 100644
> --- a/fs/readdir.c
> +++ b/fs/readdir.c
> @@ -91,7 +91,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
>
> if (buf->result)
> return -EINVAL;
> - d_ino = ino;
> + /* Use of d_ino without st_dev is always buggy. */
> + d_ino = 0;
> +
> if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
> buf->result = -EOVERFLOW;
> return -EOVERFLOW;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

That may run afoul of the following check (where d_ino is compared to
ino). You can probably just remove that check though since you can be
reasonably sure that 0 will never overflow the field.

--
Jeff Layton <jlayton(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Wednesday 16 June 2010 20:59:13 Valerie Aurora wrote:
> @@ -91,7 +91,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
>
> if (buf->result)
> return -EINVAL;
> - d_ino = ino;
> + /* Use of d_ino without st_dev is always buggy. */
> + d_ino = 0;
> +
> if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
> buf->result = -EOVERFLOW;
> return -EOVERFLOW;

Isn't this just the path taken by sys_oldreaddir?

Glibc (at least on my box) translates all user calls to readdir into
sys_getdents or sys_getdents64, so I think you'd also need to change
filldir() and filldir64() for your testing.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Valerie Aurora on
On Wed, Jun 16, 2010 at 03:15:56PM -0400, Jeff Layton wrote:
> On Wed, 16 Jun 2010 14:59:13 -0400
> Valerie Aurora <vaurora(a)redhat.com> wrote:
>
> > Who needs d_ino anyway? I am running a kernel with this patch -
> > Gnome, a browser, IRC, kernel compile, etc. and everything works.
> >
> > -VAL
> >
> > commit 184f3919d0071f3bfa40010aa6919ea89999d79b
> > Author: Valerie Aurora <vaurora(a)redhat.com>
> > Date: Wed Jun 16 11:05:06 2010 -0700
> >
> > VFS: Always return 0 for d_ino
> >
> > Use of d_ino without the corresponding st_dev is always buggy in the
> > presence of submounts, bind mounts, and union mounts. E.g., the d_ino
> > of a mountpoint will be the inode number of the directory under the
> > mountpoint, not the mounted directory. Correct code must call stat(),
> > which returns the correct device ID and inode in st_dev and st_ino.
> > Since no one should be using d_ino anyway, always return 0 to detect
> > bugs.
> >
> > diff --git a/fs/readdir.c b/fs/readdir.c
> > index dd3eae1..38ea772 100644
> > --- a/fs/readdir.c
> > +++ b/fs/readdir.c
> > @@ -91,7 +91,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
> >
> > if (buf->result)
> > return -EINVAL;
> > - d_ino = ino;
> > + /* Use of d_ino without st_dev is always buggy. */
> > + d_ino = 0;
> > +
> > if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
> > buf->result = -EOVERFLOW;
> > return -EOVERFLOW;
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo(a)vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> That may run afoul of the following check (where d_ino is compared to
> ino). You can probably just remove that check though since you can be
> reasonably sure that 0 will never overflow the field.

You're right, I included that in the next version of the patch.

Thanks,

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Dillow on
On Wed, 2010-06-16 at 15:54 -0400, Valerie Aurora wrote:
> On Wed, Jun 16, 2010 at 02:59:13PM -0400, Valerie Aurora wrote:
> > Who needs d_ino anyway? I am running a kernel with this patch -
> > Gnome, a browser, IRC, kernel compile, etc. and everything works.

For large-scale Lustre users, not having a valid d_ino in the dirent is
a problem when we need to put a name (or names) to an inode number.

>From time to time, we have a problem reported in the logs against an
inode and need to figure out the name of that file. stat() is very slow
on these filesystems due to the need to talk to multiple servers to get
file size information. When we purge the filesystem, it is faster to do
a scan of the inode table looking for old files than it is to walk the
tree. Then we have to match those inodes to names for unlink().

For example, our main Lustre scratch space has over 285 million files in
it, and using find -inum takes over 72 hours to walk the tree using
stat(). Using a scanner that takes advantage of d_ino allows us to cover
that ground in 13 hours. It's not perfectly apples-to-apples, as the
custom scanner has some parallelism, but it gives you an idea of the
problem's scale. Certainly, using d_ino reduces the number of RPCs we
have to send.

Using ne2scan -- which uses libext2fs and combines the inode scan and
the name lookup -- takes over 48 hours to generate a list of candidate
files for the purge example. With an optimized inode scan and the custom
scanner above, it should take 18 hours, which is a considerable time
saver for us.

I can see that there are pitfalls in the presence of mountpoint and
such, but we control the environment on these machines, and using d_ino
is a huge win for us.
--
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: J. R. Okajima on

David Dillow:
> For example, our main Lustre scratch space has over 285 million files in
> it, and using find -inum takes over 72 hours to walk the tree using
:::
> Using ne2scan -- which uses libext2fs and combines the inode scan and
> the name lookup -- takes over 48 hours to generate a list of candidate
> files for the purge example. With an optimized inode scan and the custom
:::

While I've never heard of ne2scan, I am interested in this simplified
problem such as "find the pathname(s) from an inum in a huge fs."
Is ne2scan essentially equivalent to "debugfs ncheck inum"?

About Valeris's patch, as long as "ls -i" is useful/helpful,
> + /* Use of d_ino without st_dev is always buggy. */
is not true.


J. R. Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/