Extended file stat functions [ver #2] [Kernel]

Prev: Makefile: "make kernelrelease" should show the correct full kernel version
Next: x86: enlightenment for ticket spin locks - improve yield behavior on Xen

From: Arnd Bergmann on 30 Jun 2010 13:40

On Wednesday 30 June 2010, Jeff Layton wrote:
> In the event that we approach running out of flags, we could even use
> the last flag as a "HAS_FLAGS2" flag, to add a new flags field at the
> end. Ugly, but it would avoid the need for a new syscall. We can kick
> that potential problem down the road though. With 64 flags to play
> with, it likely won't be a problem for a while.

Along the lines of what Christoph argued, we can also just use the
new syscall when that happens.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andreas Dilger on 30 Jun 2010 17:50

On 2010-06-30, at 06:05, David Howells wrote:
> Andreas Dilger <adilger(a)dilger.ca> wrote:
>> In struct stat64 it uses "unsigned long" for both st_uid and st_gid. Having
>> a 64-bit value here is useful for CIFS servers to be able to remap different
>> UID domains into a 32-bit domain and a 32-bit UID. If you change this,
>> please remember to reorder the fields for proper 64-bit alignment.
>
> glibc, on the other hand, only supports 32-bits for these.

For the cost of those extra bytes it would definitely save a lot of extra complexity in every application packing and unpacking the struct. At a minimum put a 32-bit padding that is zero-filled for now.

>> I wouldn't object to having a 128-bit st_ino field, since this is what
>> Lustre will be using internally in the next release.
>
> so perhaps something like:
>
> struct xstat_u128 { unsigned long long lsw, msw; };
>
> however, I suspect the kernel will require a bit of reengineering to handle a
> pgoff_t and loff_t of 128-bits.

Well, not any different from having 32-bit platforms work with two 32-bit values for 64-bit offsets today, except that we would be doing this with two 64-bit values.

>> What is also very convenient that I learned Solaris stat() does is it
>> returns the device size in st_size for a block device file. This is very
>> convenient, and avoids the morass of ioctls and "binary llseek guessing"
>> used by libext2fs and libblkid to determine the size of a block device. Any
>> reason not to add this into this new syscall?
>
> That's a separate problem. That can be implemented now by overriding getattr
> on blockdev files. You could also set st_blocks and st_blksize to indicate
> parameters of the blockdev - though that may upset df, I suppose.

I don't know if Solaris does that or not, I'd have to check with someone who has more than anecdotal understanding of it. Actually, a quick google shows that st_blocks and st_blksize are undefined for block/char devices.

>>> #define XSTAT_QUERY__ORDINARY_SET 0x00000017ULL
>>> #define XSTAT_QUERY__GET_ANYWAY 0x0000007fULL
>>
>> Could you provide some information what the semantic distinction between
>> these is? It might be useful to have an "XSTAT_QUERY_LEGACY_STAT" mask that
>> returns only the fields that are in the previous struct stat, unless that is
>> what "ORDINARY_SET" means, in which case it should be renamed I think.
>
> XSTAT_QUERY_LEGACY_STAT is XSTAT_QUERY__ORDINARY_SET. Is "legacy" an
> appropriate appellation, though? They're the set most people expect to see
> and want to use.

I was thinking that most applications using this interface would use it because they have a specific need to, or it would be internal to glibc. In those cases it is useful to know what the "traditional" stat() returned, but I don't think "__ORDINARY_SET" encompasses that idea. Other possibilities include "NORMAL_STAT" or "BASIC_STAT", or similar.

>>> #define XSTAT_QUERY__DEFINED_SET 0x0000007fULL
>>
>> It is smart to have a "DEFINED_SET" mask that maps to the
>> currently-understood fields. This ensures that applications compiled
>> against a specific set of headers/struct will not request fields which they
>> don't understand. It might be better to call this "XSTAT_QUERY_ALL" so that
>> it is more easily understood and used by callers, instead of the incorrect
>> "-1" or "~0" that some may be tempted to use if they don't understand what
>> "__DEFINED_SET" means.
>
> Passing -1 (or ULONGLONG_MAX) to get everything would be reasonable.

NOOOO. That is exactly what we _don't_ want, since it makes it impossible for the kernel to actually understand which fields the application is ready to handle. If the application always uses XSTAT_QUERY_ALL, instead of "-1", then the kernel can easily tell which fields are present in the userspace structure, and what it should avoid touching.

If applications start using "-1" to mean "all fields", then it will work so long as the kernel and userspace agree on the size of struct xstat, but as soon as the kernel understands some new field, but userspace does not, the application will segfault or clobber random memory because the kernel thinks it is asking for XSTAT_QUERY_NEXT_NEW_FIELD|... when it really isn't asking for that at all.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 30 Jun 2010 19:20

Andreas Dilger <adilger(a)dilger.ca> wrote:

> For the cost of those extra bytes it would definitely save a lot of extra
> complexity in every application packing and unpacking the struct. At a
> minimum put a 32-bit padding that is zero-filled for now.

Blech. I'd prefer to just expand the fields to 64-bits.

Note that you can't just arbitrarily pass a raw 64-bit UID, say, back to
vfs_getattr() and expect it to be coped with. Those stat syscalls that return
32-bit (or even 16-bit) would have to do something with it, and glibc would
have to do something with it.

I think we'd need extra request bits to ask for the longer UID/GID - at which
point the extra result data can be appended and extra capacity in the basic
part of the struct is not required.

> > so perhaps something like:
> >
> > struct xstat_u128 { unsigned long long lsw, msw; };
> >
> > however, I suspect the kernel will require a bit of reengineering to handle
> > a pgoff_t and loff_t of 128-bits.
>
> Well, not any different from having 32-bit platforms work with two 32-bit
> values for 64-bit offsets today, except that we would be doing this with two
> 64-bit values.

gcc for 32-bit platforms can handle 64-bit numbers. gcc doesn't handle 128-bit
numbers.

This can be handled as suggested above by allocating extra result bits to get
the upper halves of longer fields:

XSTAT_REQUEST_SIZE__MSW
XSTAT_REQUEST_BLOCKS__MSW

for example.

> > Passing -1 (or ULONGLONG_MAX) to get everything would be reasonable.
>
> NOOOO. That is exactly what we _don't_ want, since it makes it impossible
> for the kernel to actually understand which fields the application is ready
> to handle. If the application always uses XSTAT_QUERY_ALL, instead of "-1",
> then the kernel can easily tell which fields are present in the userspace
> structure, and what it should avoid touching.
>
> If applications start using "-1" to mean "all fields", then it will work so
> long as the kernel and userspace agree on the size of struct xstat, but as
> soon as the kernel understands some new field, but userspace does not, the
> application will segfault or clobber random memory because the kernel thinks
> it is asking for XSTAT_QUERY_NEXT_NEW_FIELD|... when it really isn't asking
> for that at all.

As long as the field bits allocated in order and the extra results are tacked
on in bit number order, will it actually be a problem? Userspace must know how
to deal with all the bits up to the last one it knows about; anything beyond
that is irrelevant.

What would you have me do? Return an error if a request is made that the
kernel doesn't support? That's bad too. This can be handled simply by
clearing the result bit for any unsupported field.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 30 Jun 2010 19:30

On 06/30/2010 04:15 PM, David Howells wrote:
>
> gcc for 32-bit platforms can handle 64-bit numbers. gcc doesn't handle 128-bit
> numbers.
>

gcc for 64-bit platforms does handle 128-bit numbers, but I don't think
it does on 32-bit platforms.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Howells on 30 Jun 2010 20:20

H. Peter Anvin <hpa(a)zytor.com> wrote:

> gcc for 64-bit platforms does handle 128-bit numbers, but I don't think
> it does on 32-bit platforms.

How do you specify them? If I say "long long long" gcc moans that it can't
support it on x86_64.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: Makefile: "make kernelrelease" should show the correct full kernel version
Next: x86: enlightenment for ticket spin locks - improve yield behavior on Xen