fat: excessive log spamming due to corrupted fs [Kernel]

Prev: [PATCH] blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
Next: [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited()

From: Johannes Stezenbach on 22 Apr 2010 12:10

Hi,

my office mate has a 1GB USB stick with a currupted vfat fs.

(I don't really know when and how it got currupted. I
just mounted it, copied a file onto it, unmounted, unplugged.
After that corruption showed up when accessing the newly
copied file. I'm running 2.6.33.1.)

Mounting still worked but when accessing the new file the kernel
log was filled up with

Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: File system has been set read-only
Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
....
Apr 22 16:30:18 zzz kernel: FAT: Filesystem error lidAT: Filesysalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fiesysalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fesystalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystem alid cluster chain (i_pos 3AT: Fiesyalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos AT: Fiesystalid cluster chain (i_pos 34568)
Apr 22 16:30:18 zzz kernel: FAT: Filesystealid cluster chain (i_pos 3AT: Fiesystalid cluster chain (i_pos 34568)
....
(~10000 lines)

It seems that fat_fs_error() generates corrupted output
(on an Athlon 4850e dual core), and the excessive amounts
of output are IMHO not useful.

BTW: dosfsck refused to fix it.
BTW2: when my office mate plugged it into her MacBook it
caused MacOS to crash ;-)

Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: OGAWA Hirofumi on 27 Apr 2010 05:50

Johannes Stezenbach <js(a)sig21.net> writes:

> Mounting still worked but when accessing the new file the kernel
> log was filled up with
>
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
> Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: File system has been set read-only
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
> Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem error (dev sdb1)
> Apr 22 16:30:18 zzz kernel: fat_get_cluster: invalid cluster chain (i_pos 34568)
> ...
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem error lidAT: Filesysalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fiesysalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fesystalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem alid cluster chain (i_pos 3AT: Fiesyalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos AT: Fiesystalid cluster chain (i_pos 34568)
> Apr 22 16:30:18 zzz kernel: FAT: Filesystealid cluster chain (i_pos 3AT: Fiesystalid cluster chain (i_pos 34568)
> ...
> (~10000 lines)
>
>
> It seems that fat_fs_error() generates corrupted output
> (on an Athlon 4850e dual core), and the excessive amounts
> of output are IMHO not useful.

It seems, userland or readahead or read directory entires didn't stop
with EIO (dir is intended though, to salvage as many files as
possible). I'll think about using the ratelimit for fs corruption
report.

I have no idea about message corruption, vfat just call vprintf() for
it. I'll see current vprintf() locking stuff.

Thanks.
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Johannes Stezenbach on 27 Apr 2010 06:20

On Tue, Apr 27, 2010 at 06:48:04PM +0900, OGAWA Hirofumi wrote:
> Johannes Stezenbach <js(a)sig21.net> writes:
>
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem error lidAT: Filesysalid cluster chain (i_pos 34568)
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fiesysalid cluster chain (i_pos 34568)
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fesystalid cluster chain (i_pos 34568)
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem alid cluster chain (i_pos 3AT: Fiesyalid cluster chain (i_pos 34568)
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos AT: Fiesystalid cluster chain (i_pos 34568)
> > Apr 22 16:30:18 zzz kernel: FAT: Filesystealid cluster chain (i_pos 3AT: Fiesystalid cluster chain (i_pos 34568)
>
> I have no idea about message corruption, vfat just call vprintf() for
> it. I'll see current vprintf() locking stuff.

I think multiple printk per line is prone to corruption on SMP, see the
comment about KERN_CONT in kernel.h. But maybe it only happens
for /var/log/kern.log and my xconsole when generating too much
output too quickly, "dmesg -s 10000000 | less" did not show
the corruption (but only shows ~2700 lines out of the ~10000).
But I think fat should use vprintf() to a buffer and then one printk()
instead of multiple printk + vprintk.

In xconsole it looks like this:

Apr 27 12:06:45 zzz kernel: <systerrouster chain (i_pos 34568)
Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568)
Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568)
Apr 27 12:06:45 zzz kernel: <3systerroruster chain (i_pos 34568)
Apr 27 12:06:45 zzz kernel: uster chain (i_pos 34568)

Thanks
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: OGAWA Hirofumi on 27 Apr 2010 07:30

Johannes Stezenbach <js(a)sig21.net> writes:

> On Tue, Apr 27, 2010 at 06:48:04PM +0900, OGAWA Hirofumi wrote:
>> Johannes Stezenbach <js(a)sig21.net> writes:
>>
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem error lidAT: Filesysalid cluster chain (i_pos 34568)
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fiesysalid cluster chain (i_pos 34568)
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos 3AT: Fesystalid cluster chain (i_pos 34568)
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem alid cluster chain (i_pos 3AT: Fiesyalid cluster chain (i_pos 34568)
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystem ealid cluster chain (i_pos AT: Fiesystalid cluster chain (i_pos 34568)
>> > Apr 22 16:30:18 zzz kernel: FAT: Filesystealid cluster chain (i_pos 3AT: Fiesystalid cluster chain (i_pos 34568)
>>
>> I have no idea about message corruption, vfat just call vprintf() for
>> it. I'll see current vprintf() locking stuff.
>
> I think multiple printk per line is prone to corruption on SMP, see the
> comment about KERN_CONT in kernel.h. But maybe it only happens
> for /var/log/kern.log and my xconsole when generating too much
> output too quickly, "dmesg -s 10000000 | less" did not show
> the corruption (but only shows ~2700 lines out of the ~10000).
> But I think fat should use vprintf() to a buffer and then one printk()
> instead of multiple printk + vprintk.

I think KERN_CONT issue doesn't explain this corruption (i.e. preempted
at middle of vprintk()), so even if this is one vprintk(), this will not
be fixed. One of possibility is buffer overflow of printk, so truncated
the message, but of course, I'm not sure at least for now, and I'm not
checking current printk stuff yet.

Thanks.

> In xconsole it looks like this:
>
> Apr 27 12:06:45 zzz kernel: <systerrouster chain (i_pos 34568)
> Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568)
> Apr 27 12:06:45 zzz kernel: <systeerroruster chain (i_pos 34568)
> Apr 27 12:06:45 zzz kernel: <3systerroruster chain (i_pos 34568)
> Apr 27 12:06:45 zzz kernel: uster chain (i_pos 34568)
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: OGAWA Hirofumi on 27 Apr 2010 12:50

OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> writes:

>> comment about KERN_CONT in kernel.h. But maybe it only happens
>> for /var/log/kern.log and my xconsole when generating too much
>> output too quickly, "dmesg -s 10000000 | less" did not show
>> the corruption (but only shows ~2700 lines out of the ~10000).

Um, please check syslog at same time with dmesg. If there is corruption
in syslog, but there is not in dmesg. It sounds like userland problem.
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [PATCH] blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
Next: [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited()