From: Jeffrey Merkey on
Still seeing file system corruption after journal recovery in EXT3.
It's easy to reproduce, though the symptoms vary. One way is to
rebuild a program and while the program is being compiled just shut
off power to the system by pulling the plug. I am seeing the
/root/.viminfo file trashed after recovery if Vim was active during
poweroff. I am also seeing object modules getting built which the LD
linker claims are "invalid" following a recovery event. I suspect a
bug in the buffer cache since deleting the file still causes the old
data to be returned from buffer cache even when the sectors are
overwritten, but both are interrelated. Seems in some way related to
EXT3 recovery which results in the buffer cache returning old sectors
and junk.

Not hard to reproduce, but the symptoms are always a little different
but the /root/.viminfo file getting nuked seems a common affect of
this bug.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Valdis.Kletnieks on
On Mon, 07 Jun 2010 14:45:38 MDT, Jeffrey Merkey said:
> Still seeing file system corruption after journal recovery in EXT3.

Are you getting bit by one of these mount options? (from 'man mount')
There were changes a few releases ago, might want to check what
your kernel build defaulted it to in your 2.6.34.

data={journal|ordered|writeback}
Specifies the journalling mode for file data. Metadata is
always journaled. To use modes other than ordered on the root
filesystem, pass the mode to the kernel as boot parameter, e.g.
rootflags=data=journal.

journal
All data is committed into the journal prior to being
written into the main filesystem.

ordered
This is the default mode. All data is forced directly
out to the main file system prior to its metadata being
committed to the journal.

writeback
Data ordering is not preserved - data may be written into
the main filesystem after its metadata has been committed
to the journal. This is rumoured to be the highest-
throughput option. It guarantees internal filesystem
integrity, however it can allow old data to appear in
files after a crash and journal recovery.

barrier=0 / barrier=1
This enables/disables barriers. barrier=0 disables it, bar‐
rier=1 enables it. Write barriers enforce proper on-disk order‐
ing of journal commits, making volatile disk write caches safe
to use, at some performance penalty. The ext3 filesystem does
not enable write barriers by default. Be sure to enable barri‐
ers unless your disks are battery-backed one way or another.
Otherwise you risk filesystem corruption in case of power fail‐
ure.

From: Eric Sandeen on
Jeffrey Merkey wrote:
> Still seeing file system corruption after journal recovery in EXT3.
> It's easy to reproduce, though the symptoms vary. One way is to
> rebuild a program and while the program is being compiled just shut
> off power to the system by pulling the plug. I am seeing the
> /root/.viminfo file trashed after recovery if Vim was active during
> poweroff. I am also seeing object modules getting built which the LD
> linker claims are "invalid" following a recovery event. I suspect a
> bug in the buffer cache since deleting the file still causes the old
> data to be returned from buffer cache even when the sectors are
> overwritten, but both are interrelated. Seems in some way related to
> EXT3 recovery which results in the buffer cache returning old sectors
> and junk.
>
> Not hard to reproduce, but the symptoms are always a little different
> but the /root/.viminfo file getting nuked seems a common affect of
> this bug.

"file system corruption" usually means corrupted metadata, but I guess
here you mean file corruption, i.e. corrupted data.

If you have buffered data in the cache, it will be lost when you pull
the plug. If your userspace doesn't sync it, this is expected. But it's
not clear to me what you're seeing.

I'm also not clear on what you mean about deleting the file and having old
data returned. Maybe a little cut and paste from the screen would help
explain what you see.

I'd also check CONFIG_EXT3_DEFAULTS_TO_ORDERED and be sure you're
using data=ordered mode by default.

-Eric

> Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on

On Jun 7, 2010, at 6:55 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote:

> ---------- Forwarded message ----------
> From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
> Date: Mon, Jun 7, 2010 at 5:54 PM
> Subject: Re: EXT3 File System Corruption 2.6.34
> To: Eric Sandeen <sandeen(a)sandeen.net>
>
>
> REPLY TO ALL
>
> CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
>
> Whether set this way or not, should not see corruption.

Here you are mistaken. Mount with data=ordered and see. Writeback
can expose stale data.

-Eric

> I am seeing
> data corruption including the following:
>
> /boot/grub/grub.conf getting filled with binary chars
> /root/.viminfo filled with strange text chars (not binary)
> .o files filled with the same garbage.
>
> Looks like EXT3 meta data -- maybe some blocks getting transposed
> somewhere?
>
> I will recreate the data patterns I see during corruption and post
> here. They are consitent with some sort of fill pattern -- at least
> what I see in
> viminfo is.
>
> In the case of corrupted .o files, the endian headers are missing and
> trashed in the OBJ section headers -- chances are the same kind of
> garbage.
>
> Jeff
>
>
>>> Still seeing file system corruption after journal recovery in EXT3.
>>> It's easy to reproduce, though the symptoms vary. One way is to
>>> rebuild a program and while the program is being compiled just shut
>>> off power to the system by pulling the plug. I am seeing the
>>> /root/.viminfo file trashed after recovery if Vim was active during
>>> poweroff. I am also seeing object modules getting built which the
>>> LD
>>> linker claims are "invalid" following a recovery event. I suspect a
>>> bug in the buffer cache since deleting the file still causes the old
>>> data to be returned from buffer cache even when the sectors are
>>> overwritten, but both are interrelated. Seems in some way related
>>> to
>>> EXT3 recovery which results in the buffer cache returning old
>>> sectors
>>> and junk.
>>>
>>> Not hard to reproduce, but the symptoms are always a little
>>> different
>>> but the /root/.viminfo file getting nuked seems a common affect of
>>> this bug.
>>
>> "file system corruption" usually means corrupted metadata, but I
>> guess
>> here you mean file corruption, i.e. corrupted data.
>>
>> If you have buffered data in the cache, it will be lost when you pull
>> the plug. If your userspace doesn't sync it, this is expected.
>> But it's
>> not clear to me what you're seeing.
>>
>> I'm also not clear on what you mean about deleting the file and
>> having old
>> data returned. Maybe a little cut and paste from the screen would
>> help
>> explain what you see.
>>
>> I'd also check CONFIG_EXT3_DEFAULTS_TO_ORDERED and be sure you're
>> using data=ordered mode by default.
>>
>> -Eric
>>
>>> Jeff
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Valdis.Kletnieks on
On Mon, 07 Jun 2010 20:37:06 MDT, Jeffrey Merkey said:
> Cool. I'll use that from now on. wonder if the source code came from
> xdump 10 years ago ... LOL

Probably not, given that 'man hexdump' says:

BSD April 18, 1994 BSD

Plus, they obviously rolled the code for '-e formatstring' themselves, nobody
could have been so desperate to steal that code. ;)

hexdump -e '"%08.08_ax " 4/4 "%08X " " " 4/4 "%08x " ' -e '" *" 32/1 "%_p"' -e '"*\n"'

That's so old-skool it hurts. :)