From: Aneesh Kumar K. V on
On Fri, 4 Jun 2010 15:43:28 +1000, Dave Chinner <david(a)fromorbit.com> wrote:
> On Thu, Jun 03, 2010 at 09:44:06PM +0530, Aneesh Kumar K.V wrote:
> > Hi,
> >
> > The below set of patches implement open by handle support using exportfs
> > operations. This allows user space application to map a file name to file
> > handle and later open the file using handle. This should be usable
> > for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
> > XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
> .....
> > Example program: (x86_32). (x86_64 would need a different syscall number)
>
> [snip test program]
>
> Just a thought - can you write a set of tests using this (or
> similar) test program and integrate them into xfstests?
>
> That will help ensure correctness when implementing the API in other
> filesystems (e.g. XFS ;), as well as provide some level of assurance
> that we'll notice when we break the code...
>

Will do. I also have test that test for different open flags with
open_by_handle_at. Will add those tests also.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on
On Thu, 3 Jun 2010 21:44:06 +0530, "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:
> Hi,
>
> The below set of patches implement open by handle support using exportfs
> operations. This allows user space application to map a file name to file
> handle and later open the file using handle. This should be usable
> for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
> XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
>
> [1] http://nfs-ganesha.sourceforge.net/
> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/68992
>
> Changes from V12:
> a) Use CAP_DAC_READ_SEARCH instead of CAP_DAC_OVERRIDE in open_by_handle
> b) Return -ENOTDIR if O_DIRECTORY flag is specified in open_by_handle with
> handle for non directory
>
> Changes from V11:
> a) Add necessary documentation to different functions
> b) Add null pathname support to faccessat and linkat similar to
> readlinkat.
> c) compile fix on x86_64
>
> Changes from V10:
> a) Missed an stg refresh before sending out the patchset. Send
> updated patchset.
>
> Changes from V9:
> a) Fix compile errors with CONFIG_EXPORTFS not defined
> b) Return -EOPNOTSUPP if file system doesn't support fh_to_dentry exportfs callback.
>
> Changes from V8:
> a) exportfs_decode_fh now returns -ESTALE if export operations is not defined.
> b) drop get_fsid super_operations. Instead use superblock to store uuid.
>
> Changes from V7:
> a) open_by_handle now use mountdirfd to identify the vfsmount.
> b) We don't validate the UUID passed as a part of file handle in open_by_handle.
> UUID is provided as a part of file handle as an easy way for userspace to
> use the kernel returned handle as it is. It also helps in finding the 16 byte
> filessytem UUID in userspace without using file system specific libraries to
> read file system superblock. If a particular file system doesn't support UUID
> or any form of unique id this field in the file handle will be zero filled.
> c) drop freadlink syscall. Instead use readlinkat with NULL pathname to indicate
> read the link target name of the link pointed by fd. This is similar to
> sys_utimensat
> d) Instead of opencoding all the open flag related check use helper functions.
> Did finish_open_by_handle similar to finish_open.
> c) Fix may_open to not return ELOOP for symlink when we are called from handle open.
> open(2) still returns error as expected.
>
> Changes from V6:
> a) Add uuid to vfsmount lookup and drop uuid to superblock lookup
> b) Return -EOPNOTSUPP in sys_name_to_handle if the file system returned uuid
> doesn't give the same vfsmount on lookup. This ensure that we fail
> sys_name_to_handle when we have multiple file system returning same UUID.
>
> Changes from V5:
> a) added sys_name_to_handle_at syscall which takes AT_SYMLINK_NOFOLLOW flag
> instead of two syscalls sys_name_to_handle and sys_lname_to_handle.
> b) addressed review comments from Niel Brown
> c) rebased to b91ce4d14a21fc04d165be30319541e0f9204f15
> d) Add compat_sys_open_by_handle
>
> Chages from V4:
> a) Changed the syscal arguments so that we don't need compat syscalls
> as suggested by Christoph
> c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
> symlinks
> d) Changed open_by_handle to work with all file types
> e) Add ext3 support
>
> Changes from V3:
> a) Code cleanup suggested by Andreas
> b) x86_64 syscall support
> c) add compat syscall
>
> Chages from V2:
> a) Support system wide unique handle.
>
> Changes from v1:
> a) handle size is now specified in bytes
> b) returns -EOVERFLOW if the handle size is small
> c) dropped open_handle syscall and added open_by_handle_at syscall
> open_by_handle_at takes mount_fd as the directory fd of the mount point
> containing the file
> e) handle will only be unique in a given file system. So for an NFS server
> exporting multiple file system, NFS server will have to internally track the
> mount point to which a file handle belongs to. We should be able to do it much
> easily than expecting kernel to give a system wide unique file handle. System
> wide unique file handle would need much larger changes to the exportfs or VFS
> interface and I was not sure whether we really need to do that in the kernel or
> in the user space
> f) open_handle_at now only check for DAC_OVERRIDE capability
>
>
> Example program: (x86_32). (x86_64 would need a different syscall number)
> -------
> cc <source.c> -luuid
> --------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
>
> #include <fcntl.h>
> #include <unistd.h>
> #include <errno.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <string.h>
> #include <uuid/uuid.h>
>
> struct file_handle {
> int handle_size;
> int handle_type;
> uuid_t fs_uuid;
> unsigned char handle[0];
> };
>
> #define AT_FDCWD -100
> #define AT_SYMLINK_FOLLOW 0x400
>
> static int name_to_handle(const char *name, struct file_handle *fh)
> {
> return syscall(338, AT_FDCWD, name, fh, AT_SYMLINK_FOLLOW);
> }
>
> static int lname_to_handle(const char *name, struct file_handle *fh)
> {
> return syscall(338, AT_FDCWD, name, fh, 0);
> }
>
> static int open_by_handle(int mountfd, struct file_handle *fh, int flags)
> {
> return syscall(339, mountfd, fh, flags);
> }
>
> #define BUFSZ 100
> int main(int argc, char *argv[])
> {
> int fd;
> int ret;
> int mountfd;
> int handle_sz;
> struct stat bufstat;
> char buf[BUFSZ];
> char uuid[36];
> struct file_handle *fh = NULL;;
> if (argc != 3 ) {
> printf("Usage: %s <filename> <mount-dir-name>\n", argv[0]);
> exit(1);
> }
> again:
> if (fh && fh->handle_size) {
> handle_sz = fh->handle_size;
> free(fh);
> fh = malloc(sizeof(struct file_handle) + handle_sz);
> fh->handle_size = handle_sz;
> } else {
> fh = malloc(sizeof(struct file_handle));
> fh->handle_size = 0;
> }
> errno = 0;
> ret = lname_to_handle(argv[1], fh);
> if (ret && errno == EOVERFLOW) {
> printf("Found the handle size needed to be %d\n", fh->handle_size);
> goto again;
> } else if (ret) {
> perror("Error:");
> exit(1);
> }
> uuid_unparse(fh->fs_uuid, uuid);
> printf("UUID:%s\n", uuid);
> printf("Waiting for input");
> getchar();
> mountfd = open(argv[2], O_RDONLY | O_DIRECTORY);
> if (mountfd <= 0) {
> perror("Error:");
> exit(1);
> }
> fd = open_by_handle(mountfd, fh, O_RDONLY);
> if (fd <= 0 ) {
> perror("Error:");
> exit(1);
> }
> printf("Reading the content now \n");
> fstat(fd, &bufstat);
> ret = S_ISLNK(bufstat.st_mode);
> if (ret) {
> memset(buf, 0 , BUFSZ);
> readlinkat(fd, NULL, buf, BUFSZ);
> printf("%s is a symlink pointing to %s\n", argv[1], buf);
> }
> memset(buf, 0 , BUFSZ);
> while (1) {
> ret = read(fd, buf, BUFSZ -1);
> if (ret <= 0)
> break;
> buf[ret] = '\0';
> printf("%s", buf);
> memset(buf, 0 , BUFSZ);
> }
> /* Now check for faccess */
> if (faccessat(fd, NULL, W_OK, 0) == 0) {
> printf("Got write permission on the file \n");
> } else
> perror("faccess error");
> /* now try to create a hardlink */
> if (linkat(fd, NULL, AT_FDCWD, "test", 0) == 0){
> printf("created hardlink\n");
> } else
> perror("linkat error");
> return 0;
> }
>
>

git tree for this patch series is available at

http://git.kernel.org/?p=linux/kernel/git/kvaneesh/linux-open-handle.git

git://git.kernel.org/pub/scm/linux/kernel/git/kvaneesh/linux-open-handle.git open-by-handle-v13


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on
On Tue, 15 Jun 2010 22:42:50 +0530, "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:

Hi Al,

Any chance of getting this reviewed/merged in the next merge window ?

-aneesh

> Hi,
>
> The below set of patches implement open by handle support using exportfs
> operations. This allows user space application to map a file name to file
> handle and later open the file using handle. This should be usable
> for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
> XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.
>
> [1] http://nfs-ganesha.sourceforge.net/
> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/68992
>
> git repo for the patchset at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kvaneesh/linux-open-handle.git open-by-handle-v14
>
> Changes from V13:
> a) Add support for file descriptor to handle conversion. This is needed
> so that we find the right file handle for newly created files.
>
> Changes from V12:
> a) Use CAP_DAC_READ_SEARCH instead of CAP_DAC_OVERRIDE in open_by_handle
> b) Return -ENOTDIR if O_DIRECTORY flag is specified in open_by_handle with
> handle for non directory
>
> Changes from V11:
> a) Add necessary documentation to different functions
> b) Add null pathname support to faccessat and linkat similar to
> readlinkat.
> c) compile fix on x86_64
>
> Changes from V10:
> a) Missed an stg refresh before sending out the patchset. Send
> updated patchset.
>
> Changes from V9:
> a) Fix compile errors with CONFIG_EXPORTFS not defined
> b) Return -EOPNOTSUPP if file system doesn't support fh_to_dentry exportfs callback.
>
> Changes from V8:
> a) exportfs_decode_fh now returns -ESTALE if export operations is not defined.
> b) drop get_fsid super_operations. Instead use superblock to store uuid.
>
> Changes from V7:
> a) open_by_handle now use mountdirfd to identify the vfsmount.
> b) We don't validate the UUID passed as a part of file handle in open_by_handle.
> UUID is provided as a part of file handle as an easy way for userspace to
> use the kernel returned handle as it is. It also helps in finding the 16 byte
> filessytem UUID in userspace without using file system specific libraries to
> read file system superblock. If a particular file system doesn't support UUID
> or any form of unique id this field in the file handle will be zero filled.
> c) drop freadlink syscall. Instead use readlinkat with NULL pathname to indicate
> read the link target name of the link pointed by fd. This is similar to
> sys_utimensat
> d) Instead of opencoding all the open flag related check use helper functions.
> Did finish_open_by_handle similar to finish_open.
> c) Fix may_open to not return ELOOP for symlink when we are called from handle open.
> open(2) still returns error as expected.
>
> Changes from V6:
> a) Add uuid to vfsmount lookup and drop uuid to superblock lookup
> b) Return -EOPNOTSUPP in sys_name_to_handle if the file system returned uuid
> doesn't give the same vfsmount on lookup. This ensure that we fail
> sys_name_to_handle when we have multiple file system returning same UUID.
>
> Changes from V5:
> a) added sys_name_to_handle_at syscall which takes AT_SYMLINK_NOFOLLOW flag
> instead of two syscalls sys_name_to_handle and sys_lname_to_handle.
> b) addressed review comments from Niel Brown
> c) rebased to b91ce4d14a21fc04d165be30319541e0f9204f15
> d) Add compat_sys_open_by_handle
>
> Chages from V4:
> a) Changed the syscal arguments so that we don't need compat syscalls
> as suggested by Christoph
> c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
> symlinks
> d) Changed open_by_handle to work with all file types
> e) Add ext3 support
>
> Changes from V3:
> a) Code cleanup suggested by Andreas
> b) x86_64 syscall support
> c) add compat syscall
>
> Chages from V2:
> a) Support system wide unique handle.
>
> Changes from v1:
> a) handle size is now specified in bytes
> b) returns -EOVERFLOW if the handle size is small
> c) dropped open_handle syscall and added open_by_handle_at syscall
> open_by_handle_at takes mount_fd as the directory fd of the mount point
> containing the file
> e) handle will only be unique in a given file system. So for an NFS server
> exporting multiple file system, NFS server will have to internally track the
> mount point to which a file handle belongs to. We should be able to do it much
> easily than expecting kernel to give a system wide unique file handle. System
> wide unique file handle would need much larger changes to the exportfs or VFS
> interface and I was not sure whether we really need to do that in the kernel or
> in the user space
> f) open_handle_at now only check for DAC_OVERRIDE capability
>
>
> Example program: (x86_32). (x86_64 would need a different syscall number)
> -------
> cc <source.c> -luuid
> --------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
>
> #include <fcntl.h>
> #include <unistd.h>
> #include <errno.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <string.h>
> #include <uuid/uuid.h>
>
> struct file_handle {
> int handle_size;
> int handle_type;
> uuid_t fs_uuid;
> unsigned char handle[0];
> };
>
> #define AT_FDCWD -100
> #define AT_SYMLINK_FOLLOW 0x400
>
> static int name_to_handle(const char *name, struct file_handle *fh)
> {
> return syscall(338, AT_FDCWD, name, fh, AT_SYMLINK_FOLLOW);
> }
>
> static int lname_to_handle(const char *name, struct file_handle *fh)
> {
> return syscall(338, AT_FDCWD, name, fh, 0);
> }
>
> static int fd_to_handle(int fd, struct file_handle *fh)
> {
> return syscall(338, fd, NULL, fh, AT_SYMLINK_FOLLOW);
> }
>
> static int open_by_handle(int mountfd, struct file_handle *fh, int flags)
> {
> return syscall(339, mountfd, fh, flags);
> }
>
> #define BUFSZ 100
> int main(int argc, char *argv[])
> {
> int fd;
> int ret, done = 0;
> int mountfd;
> int handle_sz;
> struct stat bufstat;
> char buf[BUFSZ];
> char uuid[36];
> struct file_handle *fh = NULL;;
> if (argc != 3 ) {
> printf("Usage: %s <filename> <mount-dir-name>\n", argv[0]);
> exit(1);
> }
> again:
> if (fh && fh->handle_size) {
> handle_sz = fh->handle_size;
> free(fh);
> fh = malloc(sizeof(struct file_handle) + handle_sz);
> fh->handle_size = handle_sz;
> } else {
> fh = malloc(sizeof(struct file_handle));
> fh->handle_size = 0;
> }
> errno = 0;
> ret = lname_to_handle(argv[1], fh);
> if (ret && errno == EOVERFLOW) {
> printf("Found the handle size needed to be %d\n", fh->handle_size);
> goto again;
> } else if (ret) {
> perror("Error:");
> exit(1);
> }
> do_again:
> uuid_unparse(fh->fs_uuid, uuid);
> printf("UUID:%s\n", uuid);
> printf("Waiting for input");
> getchar();
> mountfd = open(argv[2], O_RDONLY | O_DIRECTORY);
> if (mountfd <= 0) {
> perror("Error:");
> exit(1);
> }
> fd = open_by_handle(mountfd, fh, O_RDONLY);
> if (fd <= 0 ) {
> perror("Error:");
> exit(1);
> }
> printf("Reading the content now \n");
> fstat(fd, &bufstat);
> ret = S_ISLNK(bufstat.st_mode);
> if (ret) {
> memset(buf, 0 , BUFSZ);
> readlinkat(fd, NULL, buf, BUFSZ);
> printf("%s is a symlink pointing to %s\n", argv[1], buf);
> }
> memset(buf, 0 , BUFSZ);
> while (1) {
> ret = read(fd, buf, BUFSZ -1);
> if (ret <= 0)
> break;
> buf[ret] = '\0';
> printf("%s", buf);
> memset(buf, 0 , BUFSZ);
> }
> /* Now check for faccess */
> if (faccessat(fd, NULL, W_OK, 0) == 0) {
> printf("Got write permission on the file \n");
> } else
> perror("faccess error");
> /* now try to create a hardlink */
> if (linkat(fd, NULL, AT_FDCWD, "test", 0) == 0){
> printf("created hardlink\n");
> } else
> perror("linkat error");
> if (done)
> exit(0);
> printf("Map fd to handle \n");
> ret = fd_to_handle(fd, fh);
> if (ret) {
> perror("Error:");
> exit(1);
> }
> done = 1;
> goto do_again;
> }
>
> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on
On Fri, 2 Jul 2010 06:41:08 +1000, Neil Brown <neilb(a)suse.de> wrote:
> On Thu, 01 Jul 2010 21:58:54 +0530
> "Aneesh Kumar K. V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:
>
> > On Tue, 15 Jun 2010 22:42:50 +0530, "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote:
> >
> > Hi Al,
> >
> > Any chance of getting this reviewed/merged in the next merge window ?
>
> My own opinion of the patchset is that the code itself is fine,
> however there is one part of the interface that bothers me.
>
> I think that it is a little ugly that filesystem uuid extraction is so
> closely tied to filehandle manipulation. They are certainly related, and we
> certainly need to be able to get the filesystem uuid directly from the
> filesystem, but given that filehandle -> fd mapping doesn't (and shouldn't)
> use the uuid, the fact that fd/name -> filehandle mapping does return the
> uuid looks like it is simply piggy backing some functionality on the side,
> rather than creating a properly designed and general interface.
>
> I would feel happier about the patches if you removed all reference to uuids
> and then found some other way to ask a filesystem what its uuid was.
>
> This is not an issue that would make be want to stop the patches going
> upstream, but it does hold me back from offering a reviewed-by or
> acked-by (for whatever they might be worth).
>

One use case i had was that if the userspace file server can directly work
with the returned file system UUID, the it can build the file handle for client
in a single call.


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on
On 2010-07-01, at 14:41, Neil Brown <neilb(a)suse.de> wrote:
> I think that it is a little ugly that filesystem uuid extraction is so
> closely tied to filehandle manipulation. They are certainly related, and we
> certainly need to be able to get the filesystem uuid directly from the
> filesystem, but given that filehandle -> fd mapping doesn't (and shouldn't)
> use the uuid, the fact that fd/name -> filehandle mapping does return the
> uuid looks like it is simply piggy backing some functionality on the side,
> rather than creating a properly designed and general interface.

I disagree. Getting the UUID as part of the filehandle avoids an extra system call for the client and avoids the need for some other non-standard interface to get the UUID.

I'd like to be able to use this interface to implement the distributed open call proposed by the POSIX HECWG. This allows one client to do the path traversal, broadcast the file handle to the (maybe) 1M processes in the job via MPI, and then the other clients can open the file by handle without doing 1M times the full path traversal (which might be 10's of RPCs per process).

Cheers, Andreas
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/