From: David Howells on
Christoph Hellwig <hch(a)infradead.org> wrote:

> Adding Uli to the Cc list to make sure this system call is useful
> for glibc / can be exported by it. Otherwise it's rather pointless
> to add it.
>
> > (6) BSD stat compatibility: Including more fields from the BSD stat such
> > as creation time (st_btime) and inode generation number (st_gen)
> > [Jeremy Allison, Bernd Schubert]
>
> How is this different from (1) and (4)?

A matter of intent, really, and who proposed it.

> > (7) Extra coherency data may be useful in making backups [Andreas Dilger].
>
> What do you mean with that?

There are extra dates and version numbers potentially available. This may be
useful in making backups. Ask Andreas.

> > (8) Allow the filesystem to indicate what it can/cannot provide: A
> > filesystem can now say it doesn't support a standard stat feature if
> > that isn't available.
>
> What for?

So that you can decide not to use it. Some of our filesystems fabricate things
that they don't actually store.

> > (9) Make the fields a consistent size on all arches, and make them large.
>
> Why making them large for the sake of it? We'll need massive changes
> all through libc and applications to ever make use of this. So please
> coordinate the types used with Uli.

Otherwise we end up with #ifdefs and duplicated fields of different sizes
within stat structs, and fields of "long" types which vary in size, depending
on the environment.

I just want to make sure that:

- st_ino is stored as 64-bit
- st_size and st_blocks are stored 64-bit
- st.{a,b,c,m}time.tv_sec are stored 64-bit

We could probably stand to make st_blksize 32-bit. I'd quite like to leave
st_gen as 64-bits and I definitely want to leave st_data_version as 64-bits.

> > The following structures are defined for the use of these new system calls:
> >
> > struct xstat_parameters {
> > unsigned long long request_mask;
> > };
>
> Just pass this as a single flag by value. And just make it an unsigned
> long to make the calling convention a lot simpler.

Already done.

> > struct xstat_dev {
> > unsigned int major, minor;
> > };
> >
> > struct xstat_time {
> > unsigned long long tv_sec, tv_nsec;
> > };
>
> No point in adding special types here that aren't genericly useful.
> Also this is the first and only system call using split major/minor
> values for the dev_t. All this just creates more churn than it helps.

I can perhaps agree on the device numbers, though some filesystems we have can
store numbers that can't be represented by dev_t. I think, however, everything
we have can be handled by a 32:32 split. The numbers could then be encoded as
desired in userspace.

The problem with using extant time structs is they use "long" or "unsigned
long". And I specifically want to get away from that, since it might be
32-bits or it might be 64-bits.

> >
> > struct xstat {
> > unsigned long long st_result_mask;
>
> Just st_mask?

Perhaps, but it contrasts nicely with request_mask, and makes it easier to
document.

> > unsigned long long st_data_version;
>
> st version?

Acceptable.

> > unsigned long long st_inode_flags;
>
>
>
> > The defined bits in request_mask and st_result_mask are:
> >
> > XSTAT_REQUEST_MODE Want/got st_mode
> > XSTAT_REQUEST_NLINK Want/got st_nlink
> > XSTAT_REQUEST_UID Want/got st_uid
> > XSTAT_REQUEST_GID Want/got st_gid
> > XSTAT_REQUEST_RDEV Want/got st_rdev
> > XSTAT_REQUEST_ATIME Want/got st_atime
> > XSTAT_REQUEST_MTIME Want/got st_mtime
> > XSTAT_REQUEST_CTIME Want/got st_ctime
> > XSTAT_REQUEST_INO Want/got st_ino
> > XSTAT_REQUEST_SIZE Want/got st_size
> > XSTAT_REQUEST_BLOCKS Want/got st_blocks
> > XSTAT_REQUEST__BASIC_STATS The stuff in the normal stat struct
> > XSTAT_REQUEST_BTIME Want/got st_btime
> > XSTAT_REQUEST_GEN Want/got st_gen
> > XSTAT_REQUEST_DATA_VERSION Want/got st_data_version
> > XSTAT_REQUEST_INODE_FLAGS Want/got st_inode_flags
> > XSTAT_REQUEST__EXTENDED_STATS The stuff in the xstat struct
> > XSTAT_REQUEST__ALL_STATS The defined set of requestables
>
> What's the point of the REQUEST in the name?

Well, they are.

> Also no double underscores inside the identifier. Instead adding a _MASK
> postfix for masks would make it a lot more clear.

Perhaps.

> > The defined bits in st_inode_flags are the usual FS_xxx_FL flags in the
> > LSW, plus some extra flags in the MSW:
> >
> > FS_SPECIAL_FL Special kernel file, such as found in procfs
> > FS_AUTOMOUNT_FL Specific automount point
> > FS_AUTOMOUNT_ANY_FL Free-form automount directory
> > FS_REMOTE_FL File is remote
> > FS_ENCRYPTED_FL File is encrypted
> > FS_SYSTEM_FL File is marked system (DOS/NTFS/CIFS)
> > FS_TEMPORARY_FL File is temporary (NTFS/CIFS)
> > FS_OFFLINE_FL File is offline (CIFS)
>
> Please don't overload the FL_ namespace even more. It's already a
> complete mess given that it overloads the extN on-disk namespace.
> You're much better off just adding a clean new namespace.

Yeah. I've been thinking that's probably the better thing to do.

> > The system calls are:
> >
> > ssize_t ret = xstat(int dfd,
> > const char *filename,
> > unsigned flags,
> > const struct xstat_parameters *params,
> > struct xstat *buffer,
> > size_t buflen);
>
> If you already have a buflen parameter there is absolute no need for
> the extra results field. Just define new fields at the end and include
> them if the bufsize is big enough and it's in the mask of requested
> fields.

Or, as someone else has already said, return -E2BIG if the result won't fit.

> > The request_mask should be set by the caller to specify extra results that
> > the caller may desire. These come in a number of classes:
> >
> > (0) dev, blksize.
> >
> > These are local data and are always available.
> >
> > (1) mode, nlinks, uid, gid, [amc]time, ino, size, blocks.
> >
> > These will be returned whether the caller asks for them or not. The
> > corresponding bits in result_mask will be set to indicate their
> > presence.
> >
> > If the caller didn't ask for them, then they may be approximated. For
> > example, NFS won't waste any time updating them from the server,
> > unless as a byproduct of updating something requested.
>
> Please don't introduce tons of special cases. Instead use a simple rule
> like:
>
> - a filesystem must return all attributes requests, or return an
> error if it can't.
> - a filesystem may return additional attributes, the caller can detect
> this by looking at st_mask.
>
> plus possibly a list of attributes the filesystem must be able to
> provide if requests. I don't see a reason to make that mask different
> from the attributes required by Posix.

Firstly: Lightweight stat: I want to say that the filesystem may return data
that is out of date if it isn't asked for specifically, but the filesystem has
a copy available. But I'm not sure that this should apply to non-standard
fields.

Secondly: It doesn't matter what POSIX wants; not all filesystems we support
have everything available. Where something that's standard is not available,
we have the opportunity to indicate this, whilst still providing a fabricated
result, so that the user can take note of this fact if they choose to, whilst
totally ignoring the indication if they prefer, and just using the fabrication.

Davod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Linus Torvalds <torvalds(a)linux-foundation.org> wrote:

> > The new information is useful for some cases. �Samba for example. �At
> > least two of the fields I'm adding are also made available through BSD's
> > stat() call, and will automatically be used for some things by autoconf
> > magic if they become available.
>
> .. that' a pointless argument. If the only way something gets used is
> through autoconf, then clearly nobody cares.

That's not what I meant at all. I meant there may be things out there that
will just use st_btime and st_gen as soon as they appear without anything
having to be done to them because these fields already exist in the BSD stat
struct.

Samba is such an example as this. It will use st_btime immediately if it
exists as the SMB protocol wants to pass the creation time around.

> Yeah, maybe it adds a flag to "ls", but let's face is - that isn't actually
> _buying_ anything.

Not having ls cause a mass automount just because you did an ls of a directory
full of automount points would be very nice.

> So the only thing that matters for new system calls is who actually
> really seriously wants to use the information, even if it's not there
> by default. Is it _anybody_ else than samba?

Perhaps. As previously mentioned, BSD (and other unices) already make some of
these fields available (notably st_btime and st_gen). We could also make a
BSD-compatible st_flags available.

> In other words, in the absense of some seriously generic users, it
> sounds more like an ioctl to me to ask for something like "creation
> time" or "inode version", when not all filesystems support anything
> like that.

I initially did them by getxattr(), but that didn't go down too well.

> Ask your samba people, for example, if they'd _ever_ do just a "xstat()"?

I suspect they would, though maybe they can say otherwise. What about SMB
directory enumeration? I believe that is effectively getdents-with-stat.
Having to do open+stat for each file for that would be painful.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on
On 2010-07-19, at 11:46, Linus Torvalds wrote:
> On Mon, Jul 19, 2010 at 10:26 AM, David Howells <dhowells(a)redhat.com> wrote:
>> I suspect they would, though maybe they can say otherwise. What about SMB
>> directory enumeration? I believe that is effectively getdents-with-stat.
>> Having to do open+stat for each file for that would be painful.
>
> Yeah, but do you need xstat information at all for something like
> that? Most people try very hard to make do with the information
> returned by readdir itself (d_type and inode number), because if you
> end up looking up each name you've already pretty much lost in a
> performance model.

This lightweight stat() interface is exactly needed for things like "color ls",
which is the default on all distros today. "ls --color" always does a stat on the file just to get the file mode to color executable files differently. For Lustre and other distributed filesystems, getting things like the current file size is hard work (i.e. multiple RPCs per file), yet "ls" doesn't care about the size or modification times unless "ls -l" is used. Same goes for "find".

> (And I do agree that a "readdirplus()" is probably something that a
> lot of server people would find useful, but obviously that's another
> cross-filesystem nightmare. Only a few filesystems can cheaply give
> you anything but d_type/d_ino, and not all do even that),

Having a readdirplus() syscall would be even better, but again only with the ability to request specific attributes. Otherwise the filesystem may be doing a lot of extra work to collect all of the file attributes, and then userspace will probably be throwing most of them away.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Jan Engelhardt <jengelh(a)medozas.de> wrote:

> Linux already has a creation time field, it's called otime (there is no "b"
> in "creation"), and you will find scattered fragments of that all over the
> kernel (foremost, fs/jfs/, now btrfs, and I also notice sysvipc having
> something with that name).

It is? It's called crtime in Ext4. st_btime, however, would be compatible
with BSD's stat, and Samba would just use it by way of autoconf magic if it
appeared.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Jan Engelhardt <jengelh(a)medozas.de> wrote:

> >> (8) Allow the filesystem to indicate what it can/cannot provide: A
> >> filesystem can now say it doesn't support a standard stat feature if
> >> that isn't available.
> >
> >What for?
>
> Given xstat.otime=0, how would you determine whether the file is really
> tagged with a date of 1970, or whether it's just the fs which didnot
> store this kind of information.

I was thinking more of stuff that's already in the Linux stat struct, some of
which is fabricated because the underlying fs doesn't support it.

Take RomFS for example: it fabricates all of st_mtime, st_atime, st_ctime,
st_nlinks, st_blocks, st_uid and st_gid because none of them are stored in the
medium

Similarly, UbiFS fabricates st_blocks and complains in a comment that it makes
no sense for that type of filesystem.

There are other examples.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/