extfs reliability [Kernel]

Prev: extfs reliability
Next: [bug] Fixing mutex_lock() under held spinlock

From: Vladislav Bolkhovitin on 29 Jul 2010 14:50

Jan Kara, on 07/29/2010 06:34 PM wrote:
> On Thu 29-07-10 18:12:29, Vladislav Bolkhovitin wrote:
>>
>> Christoph Hellwig, on 07/29/2010 05:08 PM wrote:
>>> On Thu, Jul 29, 2010 at 05:00:10PM +0400, Vladislav Bolkhovitin wrote:
>>>> You can find full kernel logs starting from iSCSI load in the attachments.
>>>>
>>>> I already reported such issues some time ago, but my reports were not too much welcomed, so I gave up. Anyway, anybody can easily do my tests at any time. They don't need any special hardware, just 2 Linux boxes: one for iSCSI target and one for iSCSI initiator (the test box itself). But they are generic for other transports as well. You can see there's nothing iSCSI specific in the traces.
>>>
>>> I was only talking about ext3.
>>
>> Yes, now ext3 is a lot more reliable. The only how I was able to confuse it was:
>>
>> ...
>> (2197) nb_write: handle 4272 was not open size=65475 ofs=0
>> (2199) nb_write: handle 4272 was not open size=65475 ofs=65534
>> (2201) nb_write: handle 4272 was not open size=65475 ofs=131068
>> (2203) nb_write: handle 4272 was not open size=65475 ofs=196602
>> (2205) nb_write: handle 4272 was not open size=65475 ofs=262136^C
>> ^C
>> root(a)ini:/mnt/dbench-mod# ^C
>> root(a)ini:/mnt/dbench-mod# ^C
>> root(a)ini:/mnt/dbench-mod# cd
>> root(a)ini:~# umount /mnt
>>
>> <- recover device
>>
>> root(a)ini:~# mount -t ext3 -o barrier=1 /dev/sdb /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>> missing codepage or helper program, or other error
>> In some cases useful info is found in syslog - try
>> dmesg | tail or so
>>
>> Kernel log: "Jul 29 22:05:32 ini kernel: [ 2905.423092] JBD: recovery failed"
> Hmm, this is strange. Are there more messages around this one?

I'd encourage you to reproduce similar setup and perform various failure
injection testings. I promise you, you'll find a lot of strange and
interesting ;). Software devices give unique opportunities for that.

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ted Ts'o on 29 Jul 2010 15:00

On Thu, Jul 29, 2010 at 05:00:10PM +0400, Vladislav Bolkhovitin wrote:
> Christoph Hellwig, on 07/29/2010 12:31 PM wrote:
> > My reading of the ext3/jbd code we explicitly wait on I/O completion
> > of dependent writes, and only require those to actually be stable
> > by issueing a flush. If that wasn't the case the default ext3
> > barriers off behaviour would not only be dangerous on devices with
> > volatile write caches, but also on devices that do not have them,
> > which in addition to the reading of the code is not what we've seen
> > in actual power fail testing, where ext3 does well as long as there
> > is no volatile write cache.
>
> Basically, it is so, but, unfortunately, not absolutely. I've just tried 2 tests on ext4 with iSCSI:

Well, this thread was talking about something else (which is how
various file systems handle barriers), and not bugs about what happen
when a disk disappears from a system due to attachment failure --- but
that's fine, we can deal with that here.

> Segmentation fault

OK, I've looked at your kernel messages, and it looks like the problem
comes from this:

/* Debugging code just in case the in-memory inode orphan list
* isn't empty. The on-disk one can be non-empty if we've
* detected an error and taken the fs readonly, but the
* in-memory list had better be clean by this point. */
if (!list_empty(&sbi->s_orphan))
dump_orphan_list(sb, sbi);
J_ASSERT(list_empty(&sbi->s_orphan)); <====

This is a "should never happen situation", and we crash so we can
figure out how we got there. For production kernels, arguably it
would probably be better to print a message and a WARN_ON(1), and then
not force a crash from a BUG_ON (which is what J_ASSERT is defined to
use).

Looking at your messages and the ext4_delete_inode() warning, I think
I know what caused it. Can you try this patch (attached below) and
see if it fixes things for you?

> I already reported such issues some time ago, but my reports were
> not too much welcomed, so I gave up. Anyway, anybody can easily do
> my tests at any time.

My apologies. I've gone through the linux-ext4 mailing list logs, and
I can't find any mention of this problem from any username @vlnb.net.
I'm not sure where you reported it, and I'm sorry we dropped your bug
report. All I can say is that we do the best that we can, and our
team is relatively small and short-handed.

- Ted

From a190d0386e601d58db6d2a6cbf00dc1c17d02136 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Thu, 29 Jul 2010 14:54:48 -0400
Subject: [PATCH] patch explicitly-drop-inode-from-orphan-list-on-ext4_delete_inode-failure

---
fs/ext4/inode.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a52d5af..533b607 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -221,6 +221,7 @@ void ext4_delete_inode(struct inode *inode)
"couldn't extend journal (err %d)", err);
stop_handle:
ext4_journal_stop(handle);
+ ext4_orphan_del(NULL, inode);
goto no_delete;
}
}
--
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: extfs reliability
Next: [bug] Fixing mutex_lock() under held spinlock