xstat: Add a pair of system calls to make extended file stats available [ver #6] [Kernel]

Prev: [PATCH] cpuidle: extend cpuidle and menu governor to handle dynamic states
Next: touchscreen: fix sign bug

From: David Howells on 22 Jul 2010 12:10

Jan Engelhardt <jengelh(a)medozas.de> wrote:

> There just is no way currently to store creation times.

What do you mean? Ext4 and BtrFS can both do so; it's just that there's no
user interface to it.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 28 Jul 2010 13:30

Neil Brown <neilb(a)suse.de> wrote:

> ctime and mtime have real cache-coherence semantics which require them being
> updated by the kernel (whether the cache is on an NFS client, in a backup
> archive, or in a .o translation of a .c file).

So does creation time, at least for CIFS caching. Creation time has potential
for spotting when the object at a pathname has changed for something else,
given the lack of inode number and inode generation from windows servers.
Creation time gives us one more datum to use.

> The only role the kernel might have would be setting the 'creation time' when
> the file was created, but it seems even that isn't always what is wanted,
> because people don't so much what the time of create of the
> container-on-disk, but the time of creation of the data-content.

That should be a timestamp in the content itself, not a filesystem metadata
timestamp.

> I would want to see a pretty convincing use-case that cannot be solved with
> xattrs before 'creation time' was added to a generic kernel interface.

Then there's no point even considering this. You could emulate the entirety
of stat() with getxattr(). I've previously posted a patch to implement the
retrieval of creation time, inode gen and data version as xattrs and been told
that it's the wrong way to do it and I should extend stat instead.

> So just use xattrs and don't involve the kernel in any detailed knowledge of
> this value.

Why not? BSD has it in its stat struct. Windows has it in its Win32
equivalents. Samba for one will look for it there, and use it if it is.

Using an xattr means an extra pathwalk and extra locking per access for any
program that wants it. It's a reasonable bet such a program will also be
stat'ing the file it wants the creation time for.

If we are going to extend stat anyway, then why not make out a short list of
extra things we could usefully return and consider adding them? Something
like creation time is reasonably easy to come by for little extra overhead.
Ext4, for example, retains a copy of it in RAM in its inode struct.

> Maybe xstat should take a list of xattrs to be retrieved as well?? or maybe
> not.

The idea of xstat() having a variable-length buffer and variable arguments has
been well derided. It ain't going to happen, much though I'd like it to. I'd
quite like to offer the opportunity to return the security label, for example.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 29 Jul 2010 12:20

Neil Brown <neilb(a)suse.de> wrote:

> This justifies for me why a CIFS client would want to extract the
> creation-time from the CIFS protocol, but not why you want to expose it via a
> generic interface.

It would also be easier for NFSD if the creation time was in struct kstat.
It's included as an optional element in NFSv4. The same goes for the data
version number. I'm not sure about the inode generation, I suspect that's used
as part of the FH construction.

However, someone was talking about a userspace NFS daemon, and there they may
want all three bits. Even Samba may want multiple bits. Calling getxattr
multiple times per file starts to add up, even for internal values.

Consider further: NFS, for example, could be made to retrieve the creation time
from the server. This can be merged with the attribute fetch done by the
getattr() call, or it could be done separately by getxattr. Unless it's stored
in RAM, that's one NFS RPC op versus two. Okay, that's a bit of an artificial
example, but still.

> Given that we have an extensible attribute framework, it seems wrong to be
> adding new attributes to *stat. If a given filesystem wants to store certain
> attributes more efficiently, then it is welcome to intercept xattr calls and
> store (say) "cifs.birthtime" directly at a known offset in the inode.

It's not attribute storage I'm thinking about, but making attribute retrieval
more efficient.

> The flip-side of extracting these various attributes is setting them.

I acknowledge that if we went down the getxattr() route, then that
automatically makes setxattr() the obvious candidate for setting things.

But think about it another way: what if you want to set several attributes?
You have to make a bunch of setxattr() calls. But what if it were possible to
do all of chmod, chgrp, chown, truncate, utimes, set_btime, etc. all in one go,
atomically? We more or less have this internally in the kernel, and it might
stand to be exposed to userspace.

It might, for example, make untarring that little bit more efficient.

> I'm still pondering those extra flags:
> FS_SPECIAL_FL
> FS_AUTOMOUNT_FL
> FS_AUTOMOUNT_ANY_FL
> FS_REMOTE_FL
> FS_ENCRYPTED_FL
> FS_OFFLINE_FL
>
> They sound like they might be useful, they are not file-metadata (like
> btime) but rather implementation details (like st_blocks). So it is probably
> sensible to include them as you have done.

I've split these away from ioc flags as ioc flags is very ext2/3/4 centric, and
those filesystems happily create their own ioc flags sets without updating the
master set.

> If a filesystem is mounted on an network-block-device, or a loop-back of a
> file on NFS, is FS_REMOTE_FL set?
> Is ROT13 enough for FS_ENCRYPTED_FL to be set?
> If the NFS server is "not responding, still trying", should FS_OFFLINE_FL get
> set on all files?
> And I cannot even guess at the different between the two FS_AUTOMOUNT flags.
> I'm sure it is something useful, but doco would be good. Should one of them
> be set on mountpoints that NFSv4 detects from the server?

Yeah. I have plans to write documentation for it, but I'd like to have a
clearer idea of what the interface might be before doing that.

But to give you an idea of the flags:

(*) FS_SPECIAL_FL - Kernel API file from a quasi-filesystem such as /proc or
/sys - the sort of thing you might not want to expose through NFSD.

(*) FS_AUTOMOUNT_FL - A named automount/referral point. You attempt to
transit this directory and the backing fs will mount something over the
top.

(*) FS_AUTOMOUNT_ANY_FL - A directory in which you can look up a non-existent
directory entry, which will cause that dirent to be fabricated and the
target filesystem be mounted over the top. Examples include looking up
arbitrary cell names in /afs, or arbitrary hostnames in autofs or amd
indirect mount directories.

(*) FS_REMOTE_FL - A filesystem object that is assumed not to be stored on the
computer issuing the request. It would be quite nice to have loopback NFS
not set the remote flag and to have NBD mounted filesystems to set the
remote flag, but this can get quite messy with things like overmounts.

My thought is that this can be used by a GUI to choose its icons for
files.

(*) FS_ENCRYPTED_FL - A file that is stored encrypted and that presumably
needs a key providing to decrypt it. CIFS has an attribute bit for this
(ATTR_ENCRYPTED).

(*) FS_OFFLINE_FL - A file that isn't immediately available, and that requires
a connection to the data store to be made. CIFS has an attribute bit for
this (ATTR_OFFLINE). AFS has a field in its volume data and an error code
indicating that a volume is offline and cannot currently be accessed.

This could be set by network filesystems for which the network or the
server is absent for example. Especially if the lightweight stat is
requested (non-blocking in essence).

> It would probably help to keep that sort of decision process (complete with
> who to blame) documented in the change-log entry, but one never thinks of
> doing that at the time.

There have been a lot of conflicting opinions on this. I'm not sure rendering
them into a list in the change log would be that useful.

> Providing everybody imposes exactly the same semantics for "creation time"...

We can invent some for Linux. The time at which an inode is created would seem
to be a sensible course, but with the ability for the creation time to be set
by archiving tools. Overwriting an existing inode by truncating it and then
writing it should keep the creation time of the inode.

I think this would then be the same behaviour as Windows.

> "well derided" like high-mem and SMP support? or "real-time" support and
> priority inheritance?
> I guess the deriders are wrong, and will eventually realise that they are
> wrong. The difficult bit is we cannot know how long it will take them, or
> how much you have to care.

Almost everyone hates the idea of having a stat function with a variable length
buffer. To quote Linus:

the "buffer+buflen" thing is still disgusting.

You might be right, though: the deriders might be wrong; it just doesn't help
at this particular point in time.

> (unambiguous documentation!! the rest is just details)

I normally do write documentation. It's just that I don't want to have to keep
changing the docs as well as constantly rewriting the code.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 31 Jul 2010 13:00

utz lehmann <lkml123(a)s2y4n2c.de> wrote:

> When abusing an existing time stamp use atime not ctime please.
> ctime has it's uses. atime was just a mistake and is nearly useless.

CacheFiles currently uses atime to determine least-recently-usedness.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andreas Dilger on 31 Jul 2010 14:50

On 2010-07-30, at 12:11, Trond Myklebust wrote:
> Your Mac has a perfectly functional CIFS client, as do your Linux boxes.
> They both interoperate just fine with Samba, and would presumably
> continue to do so if someone were to decide to reuse the ctime field on
> your Samba box as storage for a create time.

CIFS doesn't support symlinks (they just appear as the referenced file), so I've had applications that scan the filesystem recurse indefinitely due to symlinked directories on a CIFS share appearing as hard-linked directories on the client. This doesn't happen when the filesystem is accessed via NFS.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [PATCH] cpuidle: extend cpuidle and menu governor to handle dynamic states
Next: touchscreen: fix sign bug