Generic name to handle and open by handle syscalls [Kernel]

Prev: linux-next: build failure after merge of the final tree (rr tree related)
Next: Generic name to handle and open by handle syscalls

From: Dave Chinner on 4 Jun 2010 01:50

On Thu, Jun 03, 2010 at 09:44:06PM +0530, Aneesh Kumar K.V wrote:
> Hi,
>
> The below set of patches implement open by handle support using exportfs
> operations. This allows user space application to map a file name to file
> handle and later open the file using handle. This should be usable
> for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
> XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
......
> Example program: (x86_32). (x86_64 would need a different syscall number)

[snip test program]

Just a thought - can you write a set of tests using this (or
similar) test program and integrate them into xfstests?

That will help ensure correctness when implementing the API in other
filesystems (e.g. XFS ;), as well as provide some level of assurance
that we'll notice when we break the code...

Cheers,

Dave.

--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Neil Brown on 1 Jul 2010 16:50

On Thu, 01 Jul 2010 21:58:54 +0530
"Aneesh Kumar K. V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:

> On Tue, 15 Jun 2010 22:42:50 +0530, "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:
>
> Hi Al,
>
> Any chance of getting this reviewed/merged in the next merge window ?

My own opinion of the patchset is that the code itself is fine,
however there is one part of the interface that bothers me.

I think that it is a little ugly that filesystem uuid extraction is so
closely tied to filehandle manipulation. They are certainly related, and we
certainly need to be able to get the filesystem uuid directly from the
filesystem, but given that filehandle -> fd mapping doesn't (and shouldn't)
use the uuid, the fact that fd/name -> filehandle mapping does return the
uuid looks like it is simply piggy backing some functionality on the side,
rather than creating a properly designed and general interface.

I would feel happier about the patches if you removed all reference to uuids
and then found some other way to ask a filesystem what its uuid was.

This is not an issue that would make be want to stop the patches going
upstream, but it does hold me back from offering a reviewed-by or
acked-by (for whatever they might be worth).

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: hch on 2 Jul 2010 03:10

On Thu, Jul 01, 2010 at 10:02:29PM -0600, Andreas Dilger wrote:
> I'd like to be able to use this interface to implement the distributed open call proposed by the POSIX HECWG. This allows one client to do the path traversal, broadcast the file handle to the (maybe) 1M processes in the job via MPI, and then the other clients can open the file by handle without doing 1M times the full path traversal (which might be 10's of RPCs per process).

The proposal is doomed anyway. If we allow any sort of open by handle
system call for unprivilegued users we need to do reconnect the dentry
to the dcache path anyway (reconnect_path), which is more expensive than
a normal path lookup.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Neil Brown on 2 Jul 2010 18:10

On Fri, 2 Jul 2010 10:12:47 -0600
Andreas Dilger <andreas.dilger(a)oracle.com> wrote:

> On 2010-07-02, at 01:05, hch(a)infradead.org wrote:
> > On Thu, Jul 01, 2010 at 10:02:29PM -0600, Andreas Dilger wrote:
> >> I'd like to be able to use this interface to implement the distributed open call proposed by the POSIX HECWG. This allows one client to do the path traversal, broadcast the file handle to the (maybe) 1M processes in the job via MPI, and then the other clients can open the file by handle without doing 1M times the full path traversal (which might be 10's of RPCs per process).
> >
> > The proposal is doomed anyway. If we allow any sort of open by handle
> > system call for unprivilegued users we need to do reconnect the dentry
> > to the dcache path anyway (reconnect_path), which is more expensive than
> > a normal path lookup.
>
> I haven't looked at this part of the VFS in a while, but it looks like an implementation issue specific to knfsd, and shouldn't be needed for regular files. i.e. if exportfs_encode_fh() is never used on a disconnected file, then this overhead is not incurred.
>
> The above use of open_by_handle() is not for userspace NFS/Samba re-export, but to allow applications to open regular files for IO.
>

From my recollection of implementing dentry reconnection there are two
needs for it.

Firstly it is needed for directories so that the VFS can effectively lock
against directory rename races which could otherwise create disconnected
subtrees (where the first parent is a member only of one of its
descendants). So if you get a filehandle for a directory it *must* be
properly connected to the root for rename to be safe. This operation is
faster than a full path lookup if the dentry is already is cache, and slower
if it and any of the path is not in cache.
You could possibly delay the full-connection of the dentry until the first
attempt to rename beneath it. I'm not sure how much VFS surgery that would
require.

Secondly it is needed if you want to enforce the rule that the contents of a
directory are only accessible if the 'x' bit on the directory is set.
kNFSd does not enforce this (unless subtree_check is specified), partly
because it is hard to do correctly and partly because we have to trust the
client any, so trusting it to check the 'x' bit is very little extra trust.

Note that it is not possible to reliably perform filehandle lookup for
non-directories if you need a fully reconnected dentry, as
cross-directory-renames can confuse the situation beyond recovery.

Maybe open-by-handle should require DAC_OVERRIDE, or maybe a new
DAC_X_OVERRIDE. And if those aren't provided it only works for directories.
???

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: J. Bruce Fields on 6 Jul 2010 12:20

On Fri, Jul 02, 2010 at 02:45:45AM +0530, Aneesh Kumar K. V wrote:
> One use case i had was that if the userspace file server can directly
> work with the returned file system UUID,

I agree that the uuid should be split out from the rest of the
filehandle, but ...

> the it can build the file
> handle for client in a single call.

.... I don't understand why both need to come in the same system call.
Is it purely an efficiency question? If so, why do you expect this to
be significant?

(I would have thought that the system call overhead is so small, and so
many calls will already be required to perform the typical rpc, that
this would be insignificant.)

A filesystem uuid seems like a generally useful thing (maybe more so
than a filehandle), so it'd seem worth figuring out how to export that
separately.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5
Prev: linux-next: build failure after merge of the final tree (rr tree related)
Next: Generic name to handle and open by handle syscalls