From: Aneesh Kumar K. V on
On Wed, 7 Jul 2010 10:45:11 -0400, "J. Bruce Fields" <bfields(a)fieldses.org> wrote:
> On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote:
> > On Wed, 7 Jul 2010, J. Bruce Fields wrote:
> > > > > If you use sys or proc, is it possible to get the uuid from a file
> > > > > descriptor or pathname without races?
> > > >
> > > > You can do stat/fstat to find out the device number (which is unique,
> > > > but not persistent)
> > >
> > > Is it really unique over time? (Can't a given st_dev value map to one
> > > filesystem now, and another later?)
> >
> > It's unique at a single point in time. But if you have a reference
> > (e.g. open file descriptor) on the mount then that's not a problem.
> >
> > fd = open(path, ...);
> > fstat(fd, &st);
> > search st.st_dev in mountinfo
> > close(fd)
> >
> > is effectively the same as an getuuid(path) syscall (lazy unmounted
> > filesystems will not be found in mountinfo, but the reference is still
> > there so st_dev will not be reused for other filesystems).
>
> OK, cool.
>
> That still leaves the problem that there isn't always an underlying
> block device, and/or when there is it doesn't always uniquely specify
> the filesystem.
>

And for this reason we would need this as a syscall right ?

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on
On 2010-07-07, at 09:05, J. Bruce Fields wrote:
> On Wed, Jul 07, 2010 at 01:40:53AM -0600, Andreas Dilger wrote:
>> On 2010-07-06, at 11:09, Aneesh Kumar K. V wrote:
>>> Since we know that system wide file handle should include a file system
>>> identifier and a file identifier my plan was to retrieve both in the
>>> same syscall.
>>
>> Won't having it be in a separate system call be racy w.r.t. doing the pathname lookup twice?
>
> It'll be rare that a server will want to *just* get a filehandle;
> normally it will at least want to get some attributes at the same time.
> So I think it will always need to open the file first and then do the
> rest of the operations on the returned filehandle.

I think you are assuming too much about the use of the file handle. What I'm interested in is not a userspace file server, but rather a more efficient way to have 10000's to millions of clients to be able to open the same regular file, without having to do full path traversal for each one.

>>> That still leaves the problem that there isn't always an underlying
>>> block device, and/or when there is it doesn't always uniquely specify
>>> the filesystem.
>>
>> And for this reason we would need this as a syscall right ?
>
> That's the only solution I see. (Or use an xattr?)

Or... return the UUID as part of the file handle in the first place. That avoids races, avoids adding more syscalls that have to be called for each file handle, or IMNSHO the worst proposal that requires applications to parse a text file in some obscure path for each file handle (requiring a stat() to find the major/minor device of the file, walking through /proc or /sys, and other nastiness).

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on
On Wed, 7 Jul 2010 11:02:47 -0600, Andreas Dilger <andreas.dilger(a)oracle.com> wrote:
> On 2010-07-07, at 09:05, J. Bruce Fields wrote:
> > On Wed, Jul 07, 2010 at 01:40:53AM -0600, Andreas Dilger wrote:
> >> On 2010-07-06, at 11:09, Aneesh Kumar K. V wrote:
> >>> Since we know that system wide file handle should include a file system
> >>> identifier and a file identifier my plan was to retrieve both in the
> >>> same syscall.
> >>
> >> Won't having it be in a separate system call be racy w.r.t. doing the pathname lookup twice?
> >
> > It'll be rare that a server will want to *just* get a filehandle;
> > normally it will at least want to get some attributes at the same time.
> > So I think it will always need to open the file first and then do the
> > rest of the operations on the returned filehandle.
>
> I think you are assuming too much about the use of the file handle.
> What I'm interested in is not a userspace file server, but rather a
> more efficient way to have 10000's to millions of clients to be able
> to open the same regular file, without having to do full path
> traversal for each one.


With the suggested syscall approach we can do on the client that does
the path traversal.

fd = open(name)
file_identifier = fd_to_handle(fd);
fs_identifier = fd_to_fshandle(fd);
close(fd);


>
> >>> That still leaves the problem that there isn't always an underlying
> >>> block device, and/or when there is it doesn't always uniquely specify
> >>> the filesystem.
> >>
> >> And for this reason we would need this as a syscall right ?
> >
> > That's the only solution I see. (Or use an xattr?)
>
> Or... return the UUID as part of the file handle in the first place.
> That avoids races, avoids adding more syscalls that have to be called
> for each file handle, or IMNSHO the worst proposal that requires
> applications to parse a text file in some obscure path for each file
> handle (requiring a stat() to find the major/minor device of the file,
> walking through /proc or /sys, and other nastiness).

I would also like to get both file system identifier and file identifier
in a single call.

That would also imply instead of the above sequence of 4 calls, we can
do

file_handle = name_to_handle(name);

-aneesh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on
On 2010-07-07, at 12:05, Nick Piggin wrote:
> On Wed, Jul 07, 2010 at 11:02:47AM -0600, Andreas Dilger wrote:
>> I think you are assuming too much about the use of the file handle. What I'm interested in is not a userspace file server, but rather a more efficient way to have 10000's to millions of clients to be able to open the same regular file, without having to do full path traversal for each one.
>
> Really? What kind of clients? What sort of speedups do you hope to see?
> Path traversal can get vastly cheaper in both single threaded and parallel
> cases with my locking changes.

This is for Lustre clients, but really any kind of network filesystem is equally affected. This isn't really an issue of the local dcache performance, but rather network latency for each component of the path traversal, and for a large number of clients the metadata server is the bottleneck for doing the traversal.

> It is not acceptable to work around fixable deficiencies in our critical
> infrastructure like path walking with hacks like this. If path walking
> is still much too expensive, that's another story...

Two different problems, I'm afraid.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on
On Thu, 8 Jul 2010 08:21:43 +1000, Neil Brown <neilb(a)suse.de> wrote:
> On Wed, 7 Jul 2010 10:45:11 -0400
> "J. Bruce Fields" <bfields(a)fieldses.org> wrote:
>
> > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote:
> > > On Wed, 7 Jul 2010, J. Bruce Fields wrote:
> > > > > > If you use sys or proc, is it possible to get the uuid from a file
> > > > > > descriptor or pathname without races?
> > > > >
> > > > > You can do stat/fstat to find out the device number (which is unique,
> > > > > but not persistent)
> > > >
> > > > Is it really unique over time? (Can't a given st_dev value map to one
> > > > filesystem now, and another later?)
> > >
> > > It's unique at a single point in time. But if you have a reference
> > > (e.g. open file descriptor) on the mount then that's not a problem.
> > >
> > > fd = open(path, ...);
> > > fstat(fd, &st);
> > > search st.st_dev in mountinfo
> > > close(fd)
> > >
> > > is effectively the same as an getuuid(path) syscall (lazy unmounted
> > > filesystems will not be found in mountinfo, but the reference is still
> > > there so st_dev will not be reused for other filesystems).
> >
> > OK, cool.
> >
> > That still leaves the problem that there isn't always an underlying
> > block device, and/or when there is it doesn't always uniquely specify
> > the filesystem.
>
> It doesn't matter if there is an underlying block device, or if it is shared
> among subvolmes.
> st_dev is *the* primary key for filesystems. Every "struct super_block" has a
> unquie s_dev and that is returned in st_dev.
>
> For "traditional" filesystem, this is the major/minor number of the block
> device.
> For NFS and btrfs and other filesystems which don't have exclusive use of a
> block device, 'set_anon_super' is used to get a unique s_dev based on a major
> number of '0'.
>
> So you can *always* use st_dev as an identifier for the filesystem which is
> stable and unique as long as you hold an active reference to the filesystem
> (open file descriptor, cwd in fs, etc).
>
> If you poll(2) /proc/mounts to get notifications of changes to the mount
> table, then it should be quite easy to cache st-dev -> uuid mappings in a
> race-free way.
>
> There might be value in getting name_to_handle to return the st_dev of the
> target file to ensure that you haven't unexepected crossed into a different
> filesystem. I would prefer that to returning a uuid: st_dev is guaranteed
> to be unique, a uuid is only supposed to be unique (i.e. that is not
> enforced).

How about adding mnt_id to the handle ? Documentation file says it is
unique

(1) mount ID: unique identifier of the mount (may be reused after umount)

I also updated (/proc/self/mountinfo) to carry the optional uuid field
With the below patch i get in /proc/self/mountinfo

13 1 253:0 / / rw,relatime,uuid:9b5af62a-a34a-43f6-a5bb-1cc22d97e862 - ext3 /dev/root rw,errors=continue,barrier=0,data=writeback

And the handle returns the value 13 in mnt_id field. We should able to
lookup mountinfo with mnt_id and find the corresponding uuid.

diff --git a/fs/namespace.c b/fs/namespace.c
index 88058de..498bd9a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -871,6 +871,9 @@ static int show_mountinfo(struct seq_file *m, void *v)
if (IS_MNT_UNBINDABLE(mnt))
seq_puts(m, " unbindable");

+ /* print the uuid */
+ seq_printf(m, ",uuid:%pU", mnt->mnt_sb->s_uuid);
+
/* Filesystem specific data */
seq_puts(m, " - ");
show_type(m, sb);
diff --git a/fs/open.c b/fs/open.c
index 23d05d3..13d426e 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1092,6 +1092,8 @@ static long do_sys_name_to_handle(struct path *path,
handle_size *= sizeof(u32);
handle->handle_type = retval;
handle->handle_size = handle_size;
+ /* copy the mount id */
+ handle->mnt_id = path->mnt->mnt_id;
if (handle_size > f_handle.handle_size) {
/*
* set the handle_size to zero so we copy only
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ffcb9bf..5f43472 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -952,6 +952,7 @@ struct file {
};

struct file_handle {
+ int mnt_id;
int handle_size;
int handle_type;
/* file identifier */

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/