From: Arnd Bergmann on
On Thursday 15 July 2010 04:17:12 David Howells wrote:
> (10) Can be extended by using more request flags and tagging further data after
> the end of the standard return data. Such things as the following could
> be returned:
>
> - BSD st_flags or FS_IOC_GETFLAGS.
> - Volume ID / Remote Device ID [Steve French].
> - Time granularity (NFSv4 time_delta) [Steve French].
> - Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
> Michael Kerrisk].
>
> This was initially proposed as a set of xattrs, but the general preferance is
> for an extended stat structure.

I don't think I'd call this general preference. Three of the four
are fixed length and could easily be done inside the structure if you
leave a bit of space instead of a variable-length field at the end.

For the volume id, I could not find any file system that requires more
than 32 bytes here, which is also reasonable to put into the structure.
Make it 36 if you want to cover ascii encoded UUIDs.

That's at most 60 bytes for the extensions you're considering already,
plus the 152 you have already is still less than a cache line on
some machines. Padding it to 256 bytes would make it nice and round,
if you want to be really sure, make it 384 bytes.

> The following structures are defined for the use of these new system calls:
>
> struct xstat_parameters {
> unsigned long long request_mask;
> };

I'd also still argue that 32 bits would be better since you can
put them into the argument list instead of having to use a pointer
to xstat_parameters. You only use 15 bits so far, so the remaining
17 bits should go a long way. It's not as important to me as the
previous point though.

> The system calls are:
>
> ssize_t ret = xstat(int dfd,
> const char *filename,
> unsigned flags,
> const struct xstat_parameters *params,
> struct xstat *buffer,
> size_t buflen);

The resulting syscall I'd hope for would be

int xstat(dfd, const char *filename, unsigned flags,
unsigned mask, struct xstat *buf);

Everything else in your patch looks very good and has my full support.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Arnd Bergmann <arnd(a)arndb.de> wrote:

> > This was initially proposed as a set of xattrs, but the general preferance
> > is for an extended stat structure.
>
> I don't think I'd call this general preference. Three of the four
> are fixed length and could easily be done inside the structure if you
> leave a bit of space instead of a variable-length field at the end.

?

Maybe I wasn't clear: I meant having an extended stat() syscall rather than
using a bunch of getxattr()'s was the general preference.

> For the volume id, I could not find any file system that requires more
> than 32 bytes here, which is also reasonable to put into the structure.
> Make it 36 if you want to cover ascii encoded UUIDs.

You should also include a length. Volume IDs may be binary rather than
NUL-terminated strings.

> That's at most 60 bytes for the extensions you're considering already,
> plus the 152

160, I think.

> you have already is still less than a cache line on
> some machines. Padding it to 256 bytes would make it nice and round,
> if you want to be really sure, make it 384 bytes.

Which we currently allocate on the kernel stack, plus up to a couple of kstat
structs if something like eCryptFS is used. Admittedly, the base xstat struct
could be kmalloc()'d instead, but why use up all that space if you don't need
it?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on
Mark Harris <mhlk(a)osj.us> wrote:

> > struct xstat_time {
> > unsigned long long tv_sec, tv_nsec;
> > };
>
> unsigned? Existing filesystems support on-disk timestamps
> representing times prior to the epoch.

I suppose it doesn't hurt to make is signed. It's large enough...

Looking at it again, having a 64-bit field for tv_nsec is overkill. It can't
(or shouldn't) exceed 999,999,999 - well within the capability of a 32-bit
unsigned integer.

So how about using up the dead space for what Steve French wanted:

| One hole that this reminded me about is how to return the superblock
| time granularity (for NFSv4 this is attribute 51 "time_delta" which
| is called on a superblock not on a file). We run into time rounding
| issues with Samba too.

By doing something like:

struct xstat_time {
signed long long tv_sec;
unsigned int tv_nsec;
unsigned short tv_granularity;
unsigned short tv_gran_units;
};

Where tv_granularity is the minimum granularity for tv_sec and tv_nsec given
as a quantity of tv_gran_units. tv_gran_units could then be a constant, such
as:

XSTAT_NANOSECONDS_GRANULARITY
XSTAT_MICROSECONDS_GRANULARITY
XSTAT_MILLISECONDS_GRANULARITY
XSTAT_SECONDS_GRANULARITY
XSTAT_MINUTES_GRANULARITY
XSTAT_HOURS_GRANULARITY
XSTAT_DAYS_GRANULARITY

So, for example, FAT times are a 2s granularity, so FAT would set
tv_granularity to 2 and tv_gran_units to XSTAT_SECONDS_GRANULARITY.

We could even support picosecond granularity if we made tv_nsec a 5-byte
field (tv_psec):

struct xstat_time {
signed long long tv_sec;
unsigned long long tv_gran_units : 8;
unsigned long long tv_granularity : 16;
unsigned long long tv_psec : 48;
};

but that's probably excessive. Does any filesystem we currently support need
that?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Thursday 15 July 2010, David Howells wrote:
> Arnd Bergmann <arnd(a)arndb.de> wrote:
>
> > > This was initially proposed as a set of xattrs, but the general preferance
> > > is for an extended stat structure.
> >
> > I don't think I'd call this general preference. Three of the four
> > are fixed length and could easily be done inside the structure if you
> > leave a bit of space instead of a variable-length field at the end.
>
> ?
>
> Maybe I wasn't clear: I meant having an extended stat() syscall rather than
> using a bunch of getxattr()'s was the general preference.

Ok, I misparsed your statement there. I don't think anyone was
objecting the use of xstat for this.

The controversial part is only how the extension happens. I would
already feel better about it if you just dropped the
'unsigned long long st_extra_results[0];' at the end and
added a comment saying that the structure may grow in the future, though
my preference would be to space for extensions and make it fixed length.

> > For the volume id, I could not find any file system that requires more
> > than 32 bytes here, which is also reasonable to put into the structure.
> > Make it 36 if you want to cover ascii encoded UUIDs.
>
> You should also include a length. Volume IDs may be binary rather than
> NUL-terminated strings.

Yes, maybe. There are several possible encodings for this. I was actually
thinking of fixed-length string rather than zero-terminated, but that
is possible as well. If this gets added, we need to audit every possible
use to make sure each of them is covered. My point was mostly that if we
need at most 40 bytes, it doesn't have to be variable length at all.

> > That's at most 60 bytes for the extensions you're considering already,
> > plus the 152
>
> 160, I think.

right.

> > you have already is still less than a cache line on
> > some machines. Padding it to 256 bytes would make it nice and round,
> > if you want to be really sure, make it 384 bytes.
>
> Which we currently allocate on the kernel stack, plus up to a couple of kstat
> structs if something like eCryptFS is used. Admittedly, the base xstat struct
> could be kmalloc()'d instead, but why use up all that space if you don't need
> it?

If you're worried about stack utilization, xstat could also be embedded into
kstat, like

struct kstat {
u64 request_mask;
struct xstat x;
};

Then you only need one of them on the stack for sys_xstat, or have both
struct kstat and struct stat/stat64 for the other syscalls.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Friday 16 July 2010, David Howells wrote:
> Mark Harris <mhlk(a)osj.us> wrote:
> So how about using up the dead space for what Steve French wanted:
>
> | One hole that this reminded me about is how to return the superblock
> | time granularity (for NFSv4 this is attribute 51 "time_delta" which
> | is called on a superblock not on a file). We run into time rounding
> | issues with Samba too.
>
> By doing something like:
>
> struct xstat_time {
> signed long long tv_sec;
> unsigned int tv_nsec;
> unsigned short tv_granularity;
> unsigned short tv_gran_units;
> };

I like that!

> Where tv_granularity is the minimum granularity for tv_sec and tv_nsec given
> as a quantity of tv_gran_units. tv_gran_units could then be a constant, such
> as:
>
> XSTAT_NANOSECONDS_GRANULARITY
> XSTAT_MICROSECONDS_GRANULARITY
> XSTAT_MILLISECONDS_GRANULARITY
> XSTAT_SECONDS_GRANULARITY
> XSTAT_MINUTES_GRANULARITY
> XSTAT_HOURS_GRANULARITY
> XSTAT_DAYS_GRANULARITY
>
> So, for example, FAT times are a 2s granularity, so FAT would set
> tv_granularity to 2 and tv_gran_units to XSTAT_SECONDS_GRANULARITY.

You could also define the tv_gran_units to be power-of-ten nanoseconds,
making it a decimal floating point number like

enum {
XSTAT_NANOSECONDS_GRANULARITY = 0,
XSTAT_MICROSECONDS_GRANULARITY = 3,
XSTAT_MILLISECONDS_GRANULARITY = 6,
XSTAT_SECONDS_GRANULARITY = 9,
};

That would make it easier to define an xstat_time_before() function, though
it means that you could no longer do XSTAT_MINUTES_GRANULARITY and
higher directly other than { .tv_gran_units = 10, .tv_granularity = 6, }.

> We could even support picosecond granularity if we made tv_nsec a 5-byte
> field (tv_psec):
>
> struct xstat_time {
> signed long long tv_sec;
> unsigned long long tv_gran_units : 8;
> unsigned long long tv_granularity : 16;
> unsigned long long tv_psec : 48;
> };
>
> but that's probably excessive. Does any filesystem we currently support need
> that?

I wouldn't even go that far if we needed sub-ns (I don't think we do), because
that breaks old compilers that cannot do bit fields.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/