From: Jörn Engel on
On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
>
> Thanks, we definitely should have put a debug statement to catch this in
> from day 1, good debugging should be an important part of any new
> infrastructure.

Woke up early and had another look at this. Looks like a much more
widespread problem. Based on a quick grep an uncaffeinated brain:

9p no s_bdi
afs no s_bdi
ceph creates its own s_bdi
cifs no s_bdi
coda no s_bdi
ecryptfs no s_bdi
exofs no s_bdi
fuse creates its own s_bdi?
gfs2 creates its own s_bdi?
jffs2 patch exists
logfs fixed now
ncpfs no s_bdi
nfs creates its own s_bdi
ocfs2 no s_bdi
smbfs no s_bdi
ubifs creates its own s_bdi

I excluded all filesystems that appear to be read-only, block device
based or lack any sort of backing store. So there is a chance I have
missed some as well.

Jörn

--
Simplicity is prerequisite for reliability.
-- Edsger W. Dijkstra
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jörn Engel on
Linus,

I think this is bad enough that you should be involved. 32a88aa1 broke
a number of filesystems in a way that sync() would return 0 without
doing any work. Even politicians are better at keeping the promises.

This is caused by the two-liner in __sync_filesystem:
if (!sb->s_bdi)
return 0;
s_bdi is set implicitly for all filesystems using set_bdev_super(), so
most block device based filesystems are safe. There are, however, a
number of odd-balls around:

On Thu, 22 April 2010 07:54:48 +0200, Jörn Engel wrote:
>
> 9p no s_bdi
> afs no s_bdi
> ceph creates its own s_bdi
> cifs no s_bdi
> coda no s_bdi
> ecryptfs no s_bdi
> exofs no s_bdi
> fuse creates its own s_bdi?
> gfs2 creates its own s_bdi?
> jffs2 patch exists
> logfs fixed now
> ncpfs no s_bdi
> nfs creates its own s_bdi
> ocfs2 no s_bdi
> smbfs no s_bdi
> ubifs creates its own s_bdi

Obviously this list should get checked and all affected filesystems get
repaired. Additionally we should add an assertion and BUG() or refuse
to mount or something. My original patch to that extend was this:

diff --git a/fs/super.c b/fs/super.c
index f35ac60..e8af253 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -954,6 +954,8 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
if (error < 0)
goto out_free_secdata;
BUG_ON(!mnt->mnt_sb);
+ BUG_ON(!mnt->mnt_sb->s_bdi &&
+ (mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd));

error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
if (error)
goto out_sb;

The real problem is finding a condition that has neither false positives
nor false negatives. The "(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)"
part takes care of false positives like tmpfs, but it would catch none
of the network filesystems. Should we instead annotate tmpfs and friends
with something like sb->s_dont_need_bdi? It is the only way I can think
of not to miss something.

Jörn

--
People will accept your ideas much more readily if you tell them
that Benjamin Franklin said it first.
-- unknown

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Thu, Apr 22 2010, J�rn Engel wrote:
> On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
> >
> > Thanks, we definitely should have put a debug statement to catch this in
> > from day 1, good debugging should be an important part of any new
> > infrastructure.
>
> Woke up early and had another look at this. Looks like a much more
> widespread problem. Based on a quick grep an uncaffeinated brain:
>
> 9p no s_bdi
> afs no s_bdi
> ceph creates its own s_bdi
> cifs no s_bdi
> coda no s_bdi
> ecryptfs no s_bdi
> exofs no s_bdi
> fuse creates its own s_bdi?
> gfs2 creates its own s_bdi?
> jffs2 patch exists
> logfs fixed now
> ncpfs no s_bdi
> nfs creates its own s_bdi
> ocfs2 no s_bdi
> smbfs no s_bdi
> ubifs creates its own s_bdi
>
> I excluded all filesystems that appear to be read-only, block device
> based or lack any sort of backing store. So there is a chance I have
> missed some as well.

It's funky, I was pretty sure there was/is code to set a default bdi for
non-bdev file systems. It appears to be missing, that's not good. So
options include:

- Add the appropriate per-sb bdi for these file systems (right fix), or
- Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least
ensure that data gets flushed (quick fix)

I'll slap together a set of fixes for this.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Thu, Apr 22 2010, Jens Axboe wrote:
> On Thu, Apr 22 2010, J�rn Engel wrote:
> > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
> > >
> > > Thanks, we definitely should have put a debug statement to catch this in
> > > from day 1, good debugging should be an important part of any new
> > > infrastructure.
> >
> > Woke up early and had another look at this. Looks like a much more
> > widespread problem. Based on a quick grep an uncaffeinated brain:
> >
> > 9p no s_bdi
> > afs no s_bdi
> > ceph creates its own s_bdi
> > cifs no s_bdi
> > coda no s_bdi
> > ecryptfs no s_bdi
> > exofs no s_bdi
> > fuse creates its own s_bdi?
> > gfs2 creates its own s_bdi?
> > jffs2 patch exists
> > logfs fixed now
> > ncpfs no s_bdi
> > nfs creates its own s_bdi
> > ocfs2 no s_bdi
> > smbfs no s_bdi
> > ubifs creates its own s_bdi
> >
> > I excluded all filesystems that appear to be read-only, block device
> > based or lack any sort of backing store. So there is a chance I have
> > missed some as well.
>
> It's funky, I was pretty sure there was/is code to set a default bdi for
> non-bdev file systems. It appears to be missing, that's not good. So
> options include:
>
> - Add the appropriate per-sb bdi for these file systems (right fix), or
> - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least
> ensure that data gets flushed (quick fix)
>
> I'll slap together a set of fixes for this.

Here's a series for fixing these. At this point they are totally
untested except that I did compile them. Note that your analysis
appeared correct for all cases but ocfs2, which does use get_sb_bdev()
and hence gets ->s_bdi assigned.

You can see them here, I'll post the series soon:

http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus

The first patch is a helper addition, the rest are per-fs fixups.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Woodhouse on
On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote:
>
> Here's a series for fixing these. At this point they are totally
> untested except that I did compile them. Note that your analysis
> appeared correct for all cases but ocfs2, which does use get_sb_bdev()
> and hence gets ->s_bdi assigned.
>
> You can see them here, I'll post the series soon:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus
>
> The first patch is a helper addition, the rest are per-fs fixups.

Do you want to include Jörn's addition of same to get_sb_mtd_set(), with
my Acked-By: David Woodhouse <David.Woodhouse(a)intel.com> ?

--
David Woodhouse Open Source Technology Centre
David.Woodhouse(a)intel.com Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/