Add a dentry op to handle automounting rather than abusing follow

Prev: watchdog: Add watchdog driver for OCTEON SOCs.
Next: arm: mm: do not set missing flags on ScorpionMP

From: Ian Kent on 24 Jul 2010 00:20

On Fri, 2010-07-23 at 16:09 +0100, David Howells wrote:
> Here's an updated patch that:
>
> (1) Fixes a bug in error handling (needs to use path_put_conditional not
> path_put).
>
> (2) Absorbs autofs4's decisions about whether to automount or not. This
> means that colour-ls won't cause automounts unless -L is supplied or it
> doesn't get a DT_DIR flag from getdents(). It also means that autofs4
> can dispense with this logic should it choose to use d_automount().

I think my statements about this were a little incorrect.

In the current kernel the check isn't imposed for ->lookup() but only
->d_revalidate() (the deosn't already exist vs the already exists
cases). Since ->d_lookup() (currently) leaves the dentry negative until
->mkdir() that could be used in the check. But then this may be starting
to get a little too autofs specific, maybe we should re-consider passing
the flags, I don't know.

In any case I'll have a go at using this as is, and see what happens.

>
> (3) Moves all the automounter logic out of __follow_mount() into
> follow_automount().
>
> (4) Stops pathwalk at the automount point and returns that point in the fs
> tree if it decides not to automount rather than reporting ELOOP (see its
> use of EXDEV for this).
>
> David
>
> ---
> From: David Howells <dhowells(a)redhat.com>
> Subject: [PATCH] Add a dentry op to handle automounting rather than abusing follow_link()
>
> Add a dentry op (d_automount) to handle automounting directories rather than
> abusing the follow_link() inode operation. The operation is keyed off a new
> inode flag (S_AUTOMOUNT).
>
> This makes it easier to add an AT_ flag to suppress terminal segment automount
> during pathwalk. It should also remove the need for the kludge code in the
> pathwalk algorithm to handle directories with follow_link() semantics.
>
> I've only changed __follow_mount() to handle automount points, but it might be
> necessary to change follow_mount() too. The latter is only used from
> follow_dotdot(), but any automounts on ".." should be pinned whilst we're using
> a child of it.
>
> I've also extracted the mount/don't-mount logic from autofs4 and included it
> here. It makes the mount go ahead anyway if someone calls open() or creat(),
> tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
> or sticks a '/' on the end of the pathname. If they do a stat(), however,
> they'll only trigger the automount if they didn't also say O_NOFOLLOW.
>
> Signed-off-by: David Howells <dhowells(a)redhat.com>
> ---
>
> Documentation/filesystems/Locking | 2 +
> Documentation/filesystems/vfs.txt | 13 +++++
> fs/namei.c | 96 ++++++++++++++++++++++++++++++-------
> include/linux/dcache.h | 5 ++
> include/linux/fs.h | 2 +
> 5 files changed, 100 insertions(+), 18 deletions(-)
>
>
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 96d4293..ccbfa98 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -16,6 +16,7 @@ prototypes:
> void (*d_release)(struct dentry *);
> void (*d_iput)(struct dentry *, struct inode *);
> char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
> + struct vfsmount *(*d_automount)(struct path *path);
>
> locking rules:
> none have BKL
> @@ -27,6 +28,7 @@ d_delete: yes no yes no
> d_release: no no no yes
> d_iput: no no no yes
> d_dname: no no no no
> +d_automount: no no no yes
>
> --------------------------- inode_operations ---------------------------
> prototypes:
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index 94677e7..31a9e8f 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -851,6 +851,7 @@ struct dentry_operations {
> void (*d_release)(struct dentry *);
> void (*d_iput)(struct dentry *, struct inode *);
> char *(*d_dname)(struct dentry *, char *, int);
> + struct vfsmount *(*d_automount)(struct path *);
> };
>
> d_revalidate: called when the VFS needs to revalidate a dentry. This
> @@ -885,6 +886,18 @@ struct dentry_operations {
> at the end of the buffer, and returns a pointer to the first char.
> dynamic_dname() helper function is provided to take care of this.
>
> + d_automount: called when an automount dentry is to be traversed (optional).
> + This should create a new VFS mount record, mount it on the directory
> + and return the record to the caller. The caller is supplied with a
> + path parameter giving the automount directory to describe the automount
> + target and the parent VFS mount record to provide inheritable mount
> + parameters. NULL should be returned if someone else managed to make
> + the automount first. If the automount failed, then an error code
> + should be returned.
> +
> + This function is only used if S_AUTOMOUNT is set on the inode to which
> + the dentry refers.
> +
> Example :
>
> static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
> diff --git a/fs/namei.c b/fs/namei.c
> index 868d0cb..6509ca5 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -617,24 +617,82 @@ int follow_up(struct path *path)
> return 1;
> }
>
> +/*
> + * Perform an automount
> + * - return -EXDEV to tell __follow_mount() to stop and return the path we were
> + * called with.
> + */
> +static int follow_automount(struct path *path, unsigned flags, int res)
> +{
> + struct vfsmount *mnt;
> +
> + if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
> + return -EREMOTE;
> +
> + /* We want to mount if someone is trying to open/create a file of any
> + * type under the mountpoint, wants to traverse through the mountpoint
> + * or wants to open the mounted directory.
> + *
> + * We don't want to mount if someone's just doing a stat and they've
> + * set AT_SYMLINK_NOFOLLOW - unless they're stat'ing a directory and
> + * appended a '/' to the name.
> + */
> + if (!(flags & LOOKUP_FOLLOW) &&
> + !(flags & (LOOKUP_CONTINUE | LOOKUP_DIRECTORY |
> + LOOKUP_OPEN | LOOKUP_CREATE)))
> + return -EXDEV;
> +
> + current->total_link_count++;
> + if (current->total_link_count >= 40)
> + return -ELOOP;
> +
> + mnt = path->dentry->d_op->d_automount(path);
> + if (IS_ERR(mnt))
> + return PTR_ERR(mnt);
> + if (!mnt) /* mount collision */
> + return 0;
> +
> + if (mnt->mnt_sb == path->mnt->mnt_sb &&
> + mnt->mnt_root == path->dentry) {
> + mntput(mnt);
> + return -ELOOP;
> + }
> +
> + dput(path->dentry);
> + if (res)
> + mntput(path->mnt);
> + path->mnt = mnt;
> + path->dentry = dget(mnt->mnt_root);
> + return 0;
> +}
> +
> /* no need for dcache_lock, as serialization is taken care in
> * namespace.c
> */
> -static int __follow_mount(struct path *path)
> +static int __follow_mount(struct path *path, unsigned flags)
> {
> - int res = 0;
> - while (d_mountpoint(path->dentry)) {
> - struct vfsmount *mounted = lookup_mnt(path);
> - if (!mounted)
> + struct vfsmount *mounted;
> + int ret, res = 0;
> + for (;;) {
> + while (d_mountpoint(path->dentry)) {
> + mounted = lookup_mnt(path);
> + if (!mounted)
> + break;
> + dput(path->dentry);
> + if (res)
> + mntput(path->mnt);
> + path->mnt = mounted;
> + path->dentry = dget(mounted->mnt_root);
> + res = 1;
> + }
> + if (!d_automount_point(path->dentry))
> break;
> - dput(path->dentry);
> - if (res)
> - mntput(path->mnt);
> - path->mnt = mounted;
> - path->dentry = dget(mounted->mnt_root);
> + ret = follow_automount(path, flags, res);
> + if (ret < 0)
> + return ret == -EXDEV ? 0 : ret;
> res = 1;
> }
> - return res;
> + return 0;
> }
>
> static void follow_mount(struct path *path)
> @@ -702,6 +760,8 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> struct vfsmount *mnt = nd->path.mnt;
> struct dentry *dentry, *parent;
> struct inode *dir;
> + int ret;
> +
> /*
> * See if the low-level filesystem might want
> * to use its own hash..
> @@ -720,8 +780,10 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> done:
> path->mnt = mnt;
> path->dentry = dentry;
> - __follow_mount(path);
> - return 0;
> + ret = __follow_mount(path, nd->flags);
> + if (unlikely(ret < 0))
> + path_put_conditional(path, nd);
> + return ret;
>
> need_lookup:
> parent = nd->path.dentry;
> @@ -1721,11 +1783,9 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
> if (open_flag & O_EXCL)
> goto exit_dput;
>
> - if (__follow_mount(path)) {
> - error = -ELOOP;
> - if (open_flag & O_NOFOLLOW)
> - goto exit_dput;
> - }
> + error = __follow_mount(path, nd->flags);
> + if (error < 0)
> + goto exit_dput;
>
> error = -ENOENT;
> if (!path->dentry->d_inode)
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index eebb617..5380bff 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -139,6 +139,7 @@ struct dentry_operations {
> void (*d_release)(struct dentry *);
> void (*d_iput)(struct dentry *, struct inode *);
> char *(*d_dname)(struct dentry *, char *, int);
> + struct vfsmount *(*d_automount)(struct path *);
> };
>
> /* the dentry parameter passed to d_hash and d_compare is the parent
> @@ -157,6 +158,7 @@ d_compare: no yes yes no
> d_delete: no yes no no
> d_release: no no no yes
> d_iput: no no no yes
> +d_automount: no no no yes
> */
>
> /* d_flags entries */
> @@ -389,6 +391,9 @@ static inline int d_mountpoint(struct dentry *dentry)
> return dentry->d_mounted;
> }
>
> +#define d_automount_point(dentry) \
> + (dentry->d_inode && IS_AUTOMOUNT(dentry->d_inode))
> +
> extern struct vfsmount *lookup_mnt(struct path *);
> extern struct dentry *lookup_create(struct nameidata *nd, int is_dir);
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 68ca1b0..a83fc81 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -235,6 +235,7 @@ struct inodes_stat_t {
> #define S_NOCMTIME 128 /* Do not update file c/mtime */
> #define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
> #define S_PRIVATE 512 /* Inode is fs-internal */
> +#define S_AUTOMOUNT 1024 /* Automount/referral quasi-directory */
>
> /*
> * Note that nosuid etc flags are inode-specific: setting some file-system
> @@ -269,6 +270,7 @@ struct inodes_stat_t {
> #define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
> #define IS_SWAPFILE(inode) ((inode)->i_flags & S_SWAPFILE)
> #define IS_PRIVATE(inode) ((inode)->i_flags & S_PRIVATE)
> +#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
>
> /* the read-only stuff doesn't really belong here, but any other place is
> probably as bad and I don't want to create yet another include file. */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ian Kent on 26 Jul 2010 11:30

On Mon, 2010-07-26 at 15:19 +0100, David Howells wrote:
> Ian Kent <raven(a)themaw.net> wrote:
>
> > > (4) Stops pathwalk at the automount point and returns that point in the fs
> > > tree if it decides not to automount rather than reporting ELOOP (see its
> > > use of EXDEV for this).
>
> Does it make autofs easier if d_op->d_automount() is allowed to return -EXDEV
> to request this? Then you can return it in Oz mode to allow the daemon to
> see/use the underlying mountpoint without recursing back into d_automount().

Yes, it's really useful.

>
> Ideally, the daemon would use AT_NO_AUTOMOUNT, but there's no way to pass that
> to sys_mount() or sys_umount().

The use of EXDEV, and if we had a way of saying we want a negative
dentry to trigger a mount, allows this mechanism to work for autofs
without any user space changes for direct and indirect mounts, at least
to match the current function.

Having said that though I've only just begun to test the various cases.

I know this was originally meant to deal with just the follow_link abuse
but the approach as it is appears to be able to resolve a long standing
deadlock bug autofs has with indirect mounts and affords a huge amount
of simplification to the autofs module code. I really hope we can work
out a suitable way to change the implementation to allow for negative
dentrys to trigger mounts. Of course it's quite possible I'll hit a snag
during testing but I hope not as I've spent a lot of time on alternate
approaches in the last few years.

Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ian Kent on 26 Jul 2010 23:50

On Mon, 2010-07-26 at 16:54 +0100, David Howells wrote:
> Ian Kent <raven(a)themaw.net> wrote:
>
> > > Does it make autofs easier if d_op->d_automount() is allowed to return
> > > -EXDEV to request this? Then you can return it in Oz mode to allow the
> > > daemon to see/use the underlying mountpoint without recursing back into
> > > d_automount().
> >
> > Yes, it's really useful.
>
> I think what's required, then, is if d_automount() returns -EXDEV then:
>
> (1) If the dentry is terminal in the lookup path, then we just return -EXDEV
> to indicate to __follow_mount() that we really do want to stop there.
>
> (2) If the dentry is not terminal, then we convert the error to -EREMOTE to
> indicate that we can't complete the pathwalk as one of the earlier
> components is inaccessible.

Is this something others need?

Again, the exists vs not yet exists case for paths within indirect
autofs mounts. At the moment I can just set the flag on all dentrys in
the autofs fs and return EXDEV for non-empty directories in order to
return the dentry as a path component. OTOH if the dentry is a mount
embeded in the path and the mount fails we get a error return.

I could clear the flag on non-root parent dentrys during mkdir if this
is needed by others.

>
> See the attached patch.
>
> David
> ---
> diff --git a/fs/namei.c b/fs/namei.c
> index c154112..6c385d4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -653,8 +653,20 @@ static int follow_automount(struct path *path, unsigned flags, int res)
> return -ELOOP;
>
> mnt = path->dentry->d_op->d_automount(path);
> - if (IS_ERR(mnt))
> + if (IS_ERR(mnt)) {
> + /*
> + * The filesystem is allowed to return -EXDEV here to indicate
> + * they don't want to automount. For instance, autofs would do
> + * this so that its userspace daemon can mount on this dentry.
> + *
> + * However, we can only permit this if it's a terminal point in
> + * the path being looked up; if it wasn't then the remainder of
> + * the path is inaccessible and we should say so.
> + */
> + if (PTR_ERR(mnt) == -EXDEV && (flags & LOOKUP_CONTINUE))
> + return -EREMOTE;
> return PTR_ERR(mnt);
> + }
> if (!mnt) /* mount collision */
> return 0;
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ian Kent on 27 Jul 2010 09:00

On Tue, 2010-07-27 at 09:48 +0100, David Howells wrote:
> Ian Kent <raven(a)themaw.net> wrote:
>
> > Is this something others need?
>
> Not as far as I know... I think autofs is the only one doing out-of-kernel
> automounting.
>
> That doesn't mean it shouldn't be provided, though...
>
> > Again, the exists vs not yet exists case for paths within indirect
> > autofs mounts. At the moment I can just set the flag on all dentrys in
> > the autofs fs and return EXDEV for non-empty directories in order to
> > return the dentry as a path component. OTOH if the dentry is a mount
> > embeded in the path and the mount fails we get a error return.
>
> Seems redundant, but I'd say go with it for now. Maybe we can offload
> S_AUTOMOUNT to the dentry.

Yes, it is redundant but it is simple and works fine as long as I don't
apply the patch to return -EREMOTE for non-terminal dentrys. I'd need to
clear the flag, as below, on non-terminal dentrys if the -EREMOTE patch
is needed.

>
> > I could clear the flag on non-root parent dentrys during mkdir if this
> > is needed by others.
>
> I'm not sure that would actually matter, since it would come to
> follow_automount() at the same place.
>
> Note that someone who tries to stat() with AT_NO_AUTOMOUNT will cause the call
> to d_automount() to be suppressed and will see the negative or non-mounted
> directory. That might be okay for you.

With one small correction to the autofs patch I've been able to
successfully run the autofs Connectathon parser test without any user
space changes for both the nobrowse (the usual case) and the browse
(pre-created mount point directories) cases. The test consists of 230
mounts, with success and fail cases, for both direct and indirect
mounts. These mounts include various offset mount constructs as well. To
consider the test a success I require the mounts to also expire
correctly.

There are more tests to run, such as the program mount test, and the
often problematic submount test. Also, I have a submount based stress
test.

But, all in all, this is shaping up to be just what I need. Having one
single place to call back to the daemon, where I don't need to play
around with the inode mutex, makes me think this implementation will
finally resolve my deadlock issue with indirect mounts.

Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: watchdog: Add watchdog driver for OCTEON SOCs.
Next: arm: mm: do not set missing flags on ScorpionMP

Add a dentry op to handle automounting rather than abusing follow_link() [ver #2]