xstat: Add a pair of system calls to make extended file stats available [ver #6] [Kernel]

Prev: nfs client hang
Next: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type

From: Trond Myklebust on 31 Jul 2010 15:10

On Sat, 2010-07-31 at 12:41 -0600, Andreas Dilger wrote:
> On 2010-07-30, at 12:11, Trond Myklebust wrote:
> > Your Mac has a perfectly functional CIFS client, as do your Linux boxes.
> > They both interoperate just fine with Samba, and would presumably
> > continue to do so if someone were to decide to reuse the ctime field on
> > your Samba box as storage for a create time.
>
> CIFS doesn't support symlinks (they just appear as the referenced file), so I've had applications that scan the filesystem recurse indefinitely due to symlinked directories on a CIFS share appearing as hard-linked directories on the client. This doesn't happen when the filesystem is accessed via NFS.

Sigh... So please explain how it would be useful to export that
particular filesystem through _both_ CIFS and NFS?

My point was that in most circumstances you want to export either
through CIFS or through NFS, but very rarely both.

I also made the point that converting ctime into a creation time would
break NFS, but it would be a limited breakage, mainly affecting the
client's ability to detect ACL changes, and possibly causing the inode
to get temporarily updated with stale attribute information on occasion
due to out-of-order RPC replies.

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Layton on 1 Aug 2010 09:20

On Fri, 30 Jul 2010 14:11:46 -0400
Trond Myklebust <trond.myklebust(a)fys.uio.no> wrote:

> On Fri, 2010-07-30 at 13:55 -0400, Phil Pishioneri wrote:
> > On 7/22/10 2:59 PM, Trond Myklebust wrote:
> > > The fact remains that most of us would be hard pressed to name an
> > > application
> >
> > Microsoft Office?
> >
> > > that requires you to share the same dataset to both
> > > Windows/CIFS and posix NFS clients.
> >
> > NFS client: Mac OS X (NFSv3, since v4 on it is still alpha *cough*).
> >
> > > tends to discourage mixing the two environments.
> >
> > Or is "discourage" not strong enough term to describe that we shouldn't
> > be doing this?
> >
> > -Phil
>
> Your Mac has a perfectly functional CIFS client, as do your Linux boxes.
> They both interoperate just fine with Samba, and would presumably
> continue to do so if someone were to decide to reuse the ctime field on
> your Samba box as storage for a create time.
>
> Trond
>

It's not so much particular applications that require access to the
same data via NFS and CIFS. There is, however a common desire to share
the same data to different client OS'.

All of the unix CIFS clients that I know of (including Linux's) trail
NFS in several areas. For instance, if you need to have the same data
accessible by multiple users using their own credentials then you need
multiple mounts.

--
Jeff Layton <jlayton(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Layton on 1 Aug 2010 09:40

On Thu, 29 Jul 2010 09:04:01 +1000
Neil Brown <neilb(a)suse.de> wrote:

> On Wed, 28 Jul 2010 18:28:02 +0100
> David Howells <dhowells(a)redhat.com> wrote:
>
> > Neil Brown <neilb(a)suse.de> wrote:
> >
> > > ctime and mtime have real cache-coherence semantics which require them being
> > > updated by the kernel (whether the cache is on an NFS client, in a backup
> > > archive, or in a .o translation of a .c file).
> >
> > So does creation time, at least for CIFS caching. Creation time has potential
> > for spotting when the object at a pathname has changed for something else,
> > given the lack of inode number and inode generation from windows servers.
> > Creation time gives us one more datum to use.
>
> This justifies for me why a CIFS client would want to extract the
> creation-time from the CIFS protocol, but not why you want to expose it via a
> generic interface.
> The kernel/filesystem doesn't need to maintain creation-time to meet this
> need, only the CIFS server needs to maintain it - the kernel/filesystem just
> needs to provide somewhere to store it - xattrs.
>
> Given that we have an extensible attribute framework, it seems wrong to be
> adding new attributes to *stat. If a given filesystem wants to store certain
> attributes more efficiently, then it is welcome to intercept xattr calls and
> store (say) "cifs.birthtime" directly at a known offset in the inode.
>

The problem with the above approach is that you're assuming that the
data in question is always accessed via the CIFS server. If someone
comes along and messes with the data outside of CIFS, then samba won't
have knowledge of that and the birthtime will be wrong.

There's some history behind this as well -- samba tracks windows ACLs
via xattr and it can be very problematic keeping those up to date when
the data is accessed outside of samba.

I think presenting this data via xattr makes the most sense. It's
simple and as Neil points out, it also provides us with a clealy
settable interface. If we ever get an xstat-like syscall, we can always
present the same data via that as well.

I also think it's quite reasonable to consider tracking birthtime in a
generic inode field. In the absence of that, filesystems could track
this themselves in their filesystem-specific inode structs.

Furthermore, I'll go ahead and propose the following (simple) semantics:

1) birthtime is initialized to the current time when a new inode is
created

2) it's settable via the xattr to an arbitrary value

Either way, the xattr for this ought to be named the same on all
filesystems. Samba shouldn't need to know or care what the underlying
filesystem is, as long as it presents the correct xattr.

That should make samba happy, and be reasonably simple to implement.

--
Jeff Layton <jlayton(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Neil Brown on 2 Aug 2010 21:20

On Thu, 29 Jul 2010 17:15:15 +0100
David Howells <dhowells(a)redhat.com> wrote:

> Neil Brown <neilb(a)suse.de> wrote:
>
> > This justifies for me why a CIFS client would want to extract the
> > creation-time from the CIFS protocol, but not why you want to expose it via a
> > generic interface.
>
> It would also be easier for NFSD if the creation time was in struct kstat.
> It's included as an optional element in NFSv4. The same goes for the data
> version number. I'm not sure about the inode generation, I suspect that's used
> as part of the FH construction.
>
> However, someone was talking about a userspace NFS daemon, and there they may
> want all three bits. Even Samba may want multiple bits. Calling getxattr
> multiple times per file starts to add up, even for internal values.
>
> Consider further: NFS, for example, could be made to retrieve the creation time
> from the server. This can be merged with the attribute fetch done by the
> getattr() call, or it could be done separately by getxattr. Unless it's stored
> in RAM, that's one NFS RPC op versus two. Okay, that's a bit of an artificial
> example, but still.
>
> > Given that we have an extensible attribute framework, it seems wrong to be
> > adding new attributes to *stat. If a given filesystem wants to store certain
> > attributes more efficiently, then it is welcome to intercept xattr calls and
> > store (say) "cifs.birthtime" directly at a known offset in the inode.
>
> It's not attribute storage I'm thinking about, but making attribute retrieval
> more efficient.
>
> > The flip-side of extracting these various attributes is setting them.
>
> I acknowledge that if we went down the getxattr() route, then that
> automatically makes setxattr() the obvious candidate for setting things.
>
> But think about it another way: what if you want to set several attributes?
> You have to make a bunch of setxattr() calls. But what if it were possible to
> do all of chmod, chgrp, chown, truncate, utimes, set_btime, etc. all in one go,
> atomically? We more or less have this internally in the kernel, and it might
> stand to be exposed to userspace.
>
> It might, for example, make untarring that little bit more efficient.
>
> > I'm still pondering those extra flags:
> > FS_SPECIAL_FL
> > FS_AUTOMOUNT_FL
> > FS_AUTOMOUNT_ANY_FL
> > FS_REMOTE_FL
> > FS_ENCRYPTED_FL
> > FS_OFFLINE_FL
> >
> > They sound like they might be useful, they are not file-metadata (like
> > btime) but rather implementation details (like st_blocks). So it is probably
> > sensible to include them as you have done.
>
> I've split these away from ioc flags as ioc flags is very ext2/3/4 centric, and
> those filesystems happily create their own ioc flags sets without updating the
> master set.
>
> > If a filesystem is mounted on an network-block-device, or a loop-back of a
> > file on NFS, is FS_REMOTE_FL set?
> > Is ROT13 enough for FS_ENCRYPTED_FL to be set?
> > If the NFS server is "not responding, still trying", should FS_OFFLINE_FL get
> > set on all files?
> > And I cannot even guess at the different between the two FS_AUTOMOUNT flags.
> > I'm sure it is something useful, but doco would be good. Should one of them
> > be set on mountpoints that NFSv4 detects from the server?
>
> Yeah. I have plans to write documentation for it, but I'd like to have a
> clearer idea of what the interface might be before doing that.
>
> But to give you an idea of the flags:
>
> (*) FS_SPECIAL_FL - Kernel API file from a quasi-filesystem such as /proc or
> /sys - the sort of thing you might not want to expose through NFSD.
>
> (*) FS_AUTOMOUNT_FL - A named automount/referral point. You attempt to
> transit this directory and the backing fs will mount something over the
> top.
>
> (*) FS_AUTOMOUNT_ANY_FL - A directory in which you can look up a non-existent
> directory entry, which will cause that dirent to be fabricated and the
> target filesystem be mounted over the top. Examples include looking up
> arbitrary cell names in /afs, or arbitrary hostnames in autofs or amd
> indirect mount directories.
>
> (*) FS_REMOTE_FL - A filesystem object that is assumed not to be stored on the
> computer issuing the request. It would be quite nice to have loopback NFS
> not set the remote flag and to have NBD mounted filesystems to set the
> remote flag, but this can get quite messy with things like overmounts.
>
> My thought is that this can be used by a GUI to choose its icons for
> files.
>
> (*) FS_ENCRYPTED_FL - A file that is stored encrypted and that presumably
> needs a key providing to decrypt it. CIFS has an attribute bit for this
> (ATTR_ENCRYPTED).
>
> (*) FS_OFFLINE_FL - A file that isn't immediately available, and that requires
> a connection to the data store to be made. CIFS has an attribute bit for
> this (ATTR_OFFLINE). AFS has a field in its volume data and an error code
> indicating that a volume is offline and cannot currently be accessed.
>
> This could be set by network filesystems for which the network or the
> server is absent for example. Especially if the lightweight stat is
> requested (non-blocking in essence).

Thanks for these. It particularly helps when you identify how the flag might
be used - guiding GUI icon choice is certainly valid and tells me that if I
don't set the flag 'correctly' (maybe because it is too difficult) then it
isn't the end of the world.

I get the AUTOMOUNT distinction too - FS_AUTHMOUNT_ANY_FL would be good for a
GUI as it could allow you to type in a filename for it to try to follow.

I'm not sure exactly how FS_ENCRYPTED_FL would be used - if the gui might be
prompted to ask for a key there would either need to be a completely general
interface for presenting keys, or the flag should be specific to CIFS and
should mean that a key must be given to CIFS to unlock the file.

Similarly, what can you do with an OFFLINE file? Do CIFS and AFS offline
files behave the same way? If not there should be two different flags. If
so then that behaviour should be specified with the flag ... unless this flag
is just for GUI cosmetics too.

Anyway, I've been thinking more about this and have refined my position
somewhat. I'll present it here for what it is worth - feel free to ignore
bits you don't like.

Your proposed 'xstat' seems to combine a number of different goals - doing
that is always a bit dangerous as you have defend it on multiple fronts...

I see the separate goals are:
A/ allowing attributes to be accessed independently - an explicit list of
required attributes is given and the FS doesn't need to collect the other
attributes.
B/ allowing synthetic attributes to be identified - if the FS doesn't
natively support some attribute but must synthesise it, you can now
discover that fact
C/ add an ad-hoc collection of new attributes that filesystems can return if
they happen to support them
D/ do all the above with a single system call for efficiency.

I think pushing all these together is asking for trouble - arguments about one
aspect will interfere with completion of the others.

Given that we already have the 'xattr' interface it seems most sensible to
achieve 'A' by defining xattr names for all 'standard' attributes and
handling them in a common library function. Maybe 'linux.inum' to get the
inode numbers, etc. There is doubtlessly a better name than 'linux.inum'.
I understand that you tried something like this before and it was rejected.
To borrow Linus's hyperbole from up-thread:
>> Hey, whoever denounced it as stupid obviously doesn't have the neurons
>> to go around to be involved in the discussion. Ignore them.

With that in place, 'B' can be achieved by the simple expedient of not
listing (in listxattr) the system attributes that the filesystem doesn't
support natively. So if a filesystem doesn't support uid and has to fake it,
then it would not list 'linux.uid' in the xattr list, but will still return
the faked uid if explicitly asked for it.

The various proposed new attributes (C) could then be added one at a time or
as groups depending on how much opposition they receive. Some might be
generic (linux.*) while others should possibly be filesystem-specific (FAT.*,
CIFS.*).

This could result in the need to make multiple system calls to get all of the
attributes that you want. Maybe this would be a problem ... I keep hearing
that in Linux context switches are really cheap and system calls are also
really cheap, so maybe it isn't a problem.

However if you can demonstrate a cost in a credible workload you would then
have ammunition to defend a new syscall (D) which would get multiple xattrs.
And maybe one that would set multiple xattrs.

Thus you can address each goal one at a time and the more contentious parts
can be delayed without interfering with the clearly valuable parts.

Whether a particular attribute were stored in kstat, or whether the fs needed
extra disk access to get the attribute would be entirely internal details
which we are free to get wrong the first few times and then fix up once we
understand all the issues properly.

> > Providing everybody imposes exactly the same semantics for "creation time"...
>
> We can invent some for Linux. The time at which an inode is created would seem
> to be a sensible course, but with the ability for the creation time to be set
> by archiving tools. Overwriting an existing inode by truncating it and then
> writing it should keep the creation time of the inode.
>
> I think this would then be the same behaviour as Windows.

Yes, it seems that supporting the Windows behaviour is the only actual
use-case that has been suggested - so I think that we should be explicit that
this attribute has exactly the same semantics as the windows attribute. i.e.
we shouldn't invent some, we should precisely copy them.

>
> > "well derided" like high-mem and SMP support? or "real-time" support and
> > priority inheritance?
> > I guess the deriders are wrong, and will eventually realise that they are
> > wrong. The difficult bit is we cannot know how long it will take them, or
> > how much you have to care.
>
> Almost everyone hates the idea of having a stat function with a variable length
> buffer. To quote Linus:
>
> the "buffer+buflen" thing is still disgusting.
>
> You might be right, though: the deriders might be wrong; it just doesn't help
> at this particular point in time.

We do seem to suffer from the squeaky-wheel syndrome - the louder someone
complains the more attention they are given - I'm sorry I wasn't listening
when you first suggested using xattrs for accessing creation-time - maybe I
could have squeaked loudly too .... though probably I wouldn't have
considered the issues deeply enough by that time.

(Look - getxattr has buffer+buflen ! - it may well be disgusting, but
following established practice is good for consistency).

>
> > (unambiguous documentation!! the rest is just details)
>
> I normally do write documentation. It's just that I don't want to have to keep
> changing the docs as well as constantly rewriting the code.

I understand that desire ... but with an interface, the docs really are just
as important as the code!

thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Allison on 13 Aug 2010 15:20

On Fri, Aug 13, 2010 at 09:06:28PM +0200, Jan Engelhardt wrote:
>
> On Friday 2010-08-13 19:54, Jeremy Allison wrote:
> >On Fri, Aug 13, 2010 at 08:54:32AM -0400, J. Bruce Fields wrote:
> >> On Sun, Aug 08, 2010 at 06:05:01AM -0700, Jeremy Allison wrote:
> >> > We don't need to ape Windows in everything.
> >> > The coming ACL disaster will show that (we will go from an ACL
> >> > model that is slightly too complex to use, to one that is impossibly
> >> > complex to use :-).
> >>
> >> Care to elaborate?
> >
> >POSIX ACLs -> RichACLs (NT-style). Not criticising Andreas here,
> >people are asking for this. But Windows ACLs are a nightmare
> >beyond human comprehension :-). In the "too complex to be
> >usable" camp.
>
> Well, for one, ACLs in NT can be recursive IIRC. You can't say that of Linux
> ACLs - instead you have to setfacl -R and setfacl -Rd to give one user access
> to a directory and all its subdirs including future new inodes.

You do realize that Windows does exactly the same thing under
the covers, right ? Watch SMB or SMB2 traffic between a client
and Windows server when someone changes an ACL sometime :-).

Jeremy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4
Prev: nfs client hang
Next: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type