From: Jeff Garzik on
On 06/25/2010 12:01 AM, Nick Piggin wrote:
> So is "frsize" supposed to be the optimal block size, or what?
> f_bsize AFAIKS should be filesystem allocation block size because
> apparently some programs require it to calculate size of file on
> disk.
>
> If we can't change existing suboptimal legacy things, then let's
> introduce new APIs that do the right thing. Apps that care will
> eventually start using eg. a new syscall.
>
>>
>>> - statvfs(2) lacks f_type.
>>>
>>> Is there anything more we should add here? Samba wants a capabilities
>>> field, with things like sparse files, quotas, compression, encryption,
>>> case preserving/sensitive.
>>
>> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.
>
> Yes it would be tricky. I don't want to add features that will just
> be useless or go unused, but I don't want to change the syscall API
> just to add f_flags, without looking at other possibilities.


It would be nice to separate capabilities and fixed parameters (block
size) from statistics which change frequently (free space).

And are capabilities really suited to a C struct, at all? That seems
more suited to a key/value type interface, a la NFSv4 attributes.

Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Thu, Jun 24, 2010 at 05:06:45PM -0600, Andreas Dilger wrote:
> I think the right solution for this issue is to (gradually) start enforcing the "uniqueness" of the UUID in the filesystem superblock. That is what it is supposed to be for. Using (fsid, st_inode) doesn't necessarily help anything, if "fsid" isn't unique, and the same "st_inode" number is used on two different mountpoints.
>
> To start, tracking the UUID at mount time an printing a non-fatal error at mount time if the mounted UUID is not unique would help, as would having e.g. fsck track the UUIDs of the underlying filesystems and printing a non-fatal error if it hits a duplicate UUID.
>
> At some point in the future, the kernel can be changed to refuse to mount a filesystem with a duplicate UUID. I believe mount.xfs already does this.

Tracking and exposing the uuid to be exact. Having the full uuid in a
statfs/statvfs-like system call is one first step. And yes, XFS does
check the uuid during mount. But it's actually in kernelspace, not in a
mount helper which XFS doesn't have. Take a look at xfs_uuid_mount().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on
On 2010-06-24, at 22:01, Nick Piggin wrote:
> On Thu, Jun 24, 2010 at 05:13:38PM -0600, Andreas Dilger wrote:
>>> Other than types, other differences are:
>>> - statvfs(2) has is f_frsize, which seems fairly useless.
>>
>> Actually, we were just lamenting the fact that f_frsize is currently broken, because Lustre wants to export the IO size as 1MB for good RPC performance, but the underlying blocksize is 4kB (ext3 blocksize). Similarly, NFS might want to export the rsize/wsize of 32kB or 64kB even if the underlying filesystem blocksize is smaller.
>>
>>
>>> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>>> block size. The latter could be useful for disk space algorithms.
>>> Both can be ill defned.
>>
>> According to POSIX, "f_bsize" is the blocksize, but unfortunately this was

Doh, typo. "f_frsize" is the "blocksize" (i.e. the units of f_blocks), and "f_bsize" is the "optimal IO size".

The SUSv2 includes the following field definitions (not showing all of them):
> unsigned long f_bsize file system block size
> unsigned long f_frsize fundamental filesystem block size
> fsblkcnt_t f_blocks total number of blocks on file system
> in units of f_frsize

>> botched in the earlier Linux implementations so currently they are both set to the same value, and using anything other than that breaks userspace programs that get them mixed up.
>
> So is "frsize" supposed to be the optimal block size, or what?

No, "frsize" is the minimum allocation unit - it is "fragment size".

> f_bsize AFAIKS should be filesystem allocation block size because
> apparently some programs require it to calculate size of file on
> disk.

Using statvfs()/struct statvfs clearly documents that f_blocks is in units of f_frsize, but since this is a relatively new API on Linux, and statfs() used f_bsize for years to mean the same thing some applications are broken.

> If we can't change existing suboptimal legacy things, then let's
> introduce new APIs that do the right thing. Apps that care will
> eventually start using eg. a new syscall.

I'd rather NOT start a proliferation of redundant syscalls, since there is no expectation that they will be used correctly either, and it just makes applications less portable. I think it less effort to fix the few current applications using sys_statvfs() incorrectly to use f_frsize than to use some new linux-only syscall.

>> It wouldn't be a bad idea, but then you could get into issues of what exactly the above flags mean. That said, I think it is better to have broad categories of features that may be slightly ill-defined than having nothing at all.
>
> Yes it would be tricky. I don't want to add features that will just
> be useless or go unused, but I don't want to change the syscall API
> just to add f_flags, without looking at other possibilities.

SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page. According to the Solaris statvfs(3) man page I found it additionally defines:

ST_NOTRUNC 0x04 /* does not truncate file names longer than
NAME_MAX */

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on
On Fri, Jun 25, 2010 at 10:47, Andreas Dilger <adilger(a)dilger.ca> wrote:
> SUSv2 only defines the flags ST_RDONLY and ST_NOSUID, and this is also what is documented in the Linux/BSD/OSX statvfs(3) man page.  According to the Solaris statvfs(3) man page I found it additionally defines:
>
> ST_NOTRUNC   0x04    /* does not truncate file names longer than
>                        NAME_MAX */

glibc supports many more flags. SuS of course has to restrict itself,
there are not that many flags which are portable and available on all
the platforms. Look at /usr/include/bits/statvfs.h for what has to be
supported and the values to use. If the values the kernel will use
differ I'd have to (unnecessarily) convert the values. If some values
are missing/not supported I still would have to use /proc/mounts and
nothing is gained.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Fri, Jun 25, 2010 at 10:52:05AM -0700, Ulrich Drepper wrote:
> there are not that many flags which are portable and available on all
> the platforms. Look at /usr/include/bits/statvfs.h for what has to be
> supported and the values to use. If the values the kernel will use
> differ I'd have to (unnecessarily) convert the values. If some values
> are missing/not supported I still would have to use /proc/mounts and
> nothing is gained.

I don't quite get what ST_WRITE is supposed to mean. All but that one
can be supported trivially.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/