From: Miklos Szeredi on
On Thu, 24 Jun 2010, Nick Piggin wrote:
> This has come up a few times in the past, and I'd like to try to get
> an agreement on it. statvfs(2) importantly contains f_flag (mount
> flags), and is encouraged to use rather than statfs(2). The kernel
> provides a statfs syscall only.
>
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide. It's actually the last scalability
> bottleneck in the core vfs for dbench (samba) after my patches.
>
> Not only that, but it's racy.
>
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.

statfs(2) also has f_frsize since 2.6.0, only it hasn't been
documented (should be fixed now).

> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
> block size. The latter could be useful for disk space algorithms.
> Both can be ill defned.

They are the same, only the documentation is different.

> - statvfs(2) lacks f_type.
>
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.
>
> Any thoughts?

"struct statfs" and "struct statfs64" have spare fields. We could put
the f_flag in there including a magic "this is a valid f_flag" flag,
that distinguishes from the default zero value.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andy Lutomirski on
Nick Piggin wrote:
> This has come up a few times in the past, and I'd like to try to get
> an agreement on it. statvfs(2) importantly contains f_flag (mount
> flags), and is encouraged to use rather than statfs(2). The kernel
> provides a statfs syscall only.
>
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide. It's actually the last scalability
> bottleneck in the core vfs for dbench (samba) after my patches.
>
> Not only that, but it's racy.
>
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.
> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
> block size. The latter could be useful for disk space algorithms.
> Both can be ill defned.
> - statvfs(2) lacks f_type.
>
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.
>
> Any thoughts?

Something like fsid but actually specified to uniquely identify a
superblock. (Currently, fsid seems to be set by the filesystem, and
nothing in particular ensures that two different filesystems couldn't
have collisions.) We could guarantee (or have a flag guaranteeing) that
(fsid, st_inode) actually uniquely identifies an inode.

Similarly, something like fsid that uniquely identifies the vfsmount
could be useful, although I don't know how easy that would be to provide
for fstat?fs.

If we could expose the complete set of filesystem mount options so that
mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then
playing with chroots would be that much easier.

Should we expose superblock and vfsmount options separately? We have
read-only bind mounts now, but the way they work is rather inscrutable,
and if stat?fs could say "superblock is read-write but vfsmount is
readonly" then people might be able to make more sense of what's going on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Miklos Szeredi on
On Thu, 24 Jun 2010, Andy Lutomirski wrote:
> Something like fsid but actually specified to uniquely identify a
> superblock. (Currently, fsid seems to be set by the filesystem, and
> nothing in particular ensures that two different filesystems couldn't
> have collisions.) We could guarantee (or have a flag guaranteeing) that
> (fsid, st_inode) actually uniquely identifies an inode.
>
> Similarly, something like fsid that uniquely identifies the vfsmount
> could be useful, although I don't know how easy that would be to provide
> for fstat?fs.
>
> If we could expose the complete set of filesystem mount options so that
> mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then
> playing with chroots would be that much easier.
>
> Should we expose superblock and vfsmount options separately? We have
> read-only bind mounts now, but the way they work is rather inscrutable,
> and if stat?fs could say "superblock is read-write but vfsmount is
> readonly" then people might be able to make more sense of what's going on.

You'll find all of those things in /proc/self/mountinfo.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Thu, Jun 24, 2010 at 04:03:05PM +0200, Miklos Szeredi wrote:
> On Thu, 24 Jun 2010, Nick Piggin wrote:
> > This has come up a few times in the past, and I'd like to try to get
> > an agreement on it. statvfs(2) importantly contains f_flag (mount
> > flags), and is encouraged to use rather than statfs(2). The kernel
> > provides a statfs syscall only.
> >
> > This means glibc has to provide f_flag support by parsing /proc/mounts
> > and stat(2)ing mount points. This is really slow, and /proc/mounts is
> > hard for the kernel to provide. It's actually the last scalability
> > bottleneck in the core vfs for dbench (samba) after my patches.
> >
> > Not only that, but it's racy.
> >
> > Other than types, other differences are:
> > - statvfs(2) has is f_frsize, which seems fairly useless.
>
> statfs(2) also has f_frsize since 2.6.0, only it hasn't been
> documented (should be fixed now).
>
> > - statvfs(2) has f_favail.
> > - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
> > block size. The latter could be useful for disk space algorithms.
> > Both can be ill defned.
>
> They are the same, only the documentation is different.
>
> > - statvfs(2) lacks f_type.
> >
> > Is there anything more we should add here? Samba wants a capabilities
> > field, with things like sparse files, quotas, compression, encryption,
> > case preserving/sensitive.
> >
> > Any thoughts?
>
> "struct statfs" and "struct statfs64" have spare fields. We could put
> the f_flag in there including a magic "this is a valid f_flag" flag,
> that distinguishes from the default zero value.

Ah so it does. We have 5 words spare. So we should have a version
number rather than just do a per-word hack each time. We could
probably pack the version number into a few bits of f_flag though.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Lutomirski on
On Thu, Jun 24, 2010 at 10:18 AM, Miklos Szeredi <miklos(a)szeredi.hu> wrote:
> On Thu, 24 Jun 2010, Andy Lutomirski wrote:
>> Something like fsid but actually specified to uniquely identify a
>> superblock. �(Currently, fsid seems to be set by the filesystem, and
>> nothing in particular ensures that two different filesystems couldn't
>> have collisions.) �We could guarantee (or have a flag guaranteeing) that
>> (fsid, st_inode) actually uniquely identifies an inode.
>>
>> Similarly, something like fsid that uniquely identifies the vfsmount
>> could be useful, although I don't know how easy that would be to provide
>> for fstat?fs.
>>
>> If we could expose the complete set of filesystem mount options so that
>> mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then
>> playing with chroots would be that much easier.
>>
>> Should we expose superblock and vfsmount options separately? �We have
>> read-only bind mounts now, but the way they work is rather inscrutable,
>> and if stat?fs could say "superblock is read-write but vfsmount is
>> readonly" then people might be able to make more sense of what's going on.
>
> You'll find all of those things in /proc/self/mountinfo.

Wasn't the point that /proc/self/mounts (and presumably
/proc/self/mountinfo) isn't scalable and we wanted a syscall to query
it efficiently (and racelessly)?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/