From: Jeffrey Merkey on
---------- Forwarded message ----------
From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
Date: Mon, Jun 7, 2010 at 5:54 PM
Subject: Re: EXT3 File System Corruption 2.6.34
To: Eric Sandeen <sandeen(a)sandeen.net>


REPLY TO ALL

CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set

Whether set this way or not, should not see corruption. �I am seeing
data corruption including the following:

/boot/grub/grub.conf getting filled with binary chars
/root/.viminfo filled with strange text chars (not binary)
..o files filled with the same garbage.

Looks like EXT3 meta data -- maybe some blocks getting transposed somewhere?

I will recreate the data patterns I see during corruption and post
here. �They are consitent with some sort of fill pattern -- at least
what I see in
viminfo is.

In the case of corrupted .o files, the endian headers are missing and
trashed in the OBJ section headers -- chances are the same kind of
garbage.

Jeff


>> Still seeing file system corruption after journal recovery in EXT3.
>> It's easy to reproduce, though the symptoms vary. �One way is to
>> rebuild a program and while the program is being compiled just shut
>> off power to the system by pulling the plug. �I am seeing the
>> /root/.viminfo file trashed after recovery if Vim was active during
>> poweroff. �I am also seeing object modules getting built which the LD
>> linker claims are "invalid" following a recovery event. �I suspect a
>> bug in the buffer cache since deleting the file still causes the old
>> data to be returned from buffer cache even when the sectors are
>> overwritten, but both are interrelated. �Seems in some way related to
>> EXT3 recovery which results in the buffer cache returning old sectors
>> and junk.
>>
>> Not hard to reproduce, but the symptoms are always a little different
>> but the /root/.viminfo file getting nuked seems a common affect of
>> this bug.
>
> "file system corruption" usually means corrupted metadata, but I guess
> here you mean file corruption, i.e. corrupted data.
>
> If you have buffered data in the cache, it will be lost when you pull
> the plug. �If your userspace doesn't sync it, this is expected. �But it's
> not clear to me what you're seeing.
>
> I'm also not clear on what you mean about deleting the file and having old
> data returned. �Maybe a little cut and paste from the screen would help
> explain what you see.
>
> I'd also check CONFIG_EXT3_DEFAULTS_TO_ORDERED and be sure you're
> using data=ordered mode by default.
>
> -Eric
>
>> Jeff
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Jeffrey Merkey wrote:

> OK. I will set this up. You may want to make this option the default
> in the build scripts. here is a corrupted file.

It was default, but Linus changed it a while back.

> This was a .gif
> image file I saved THEN AFTER SAVING THE FILE I pulled the power to
> the machine and during recovery the file was FUCKED.

I assume your application did not sync the data, and buffered data
loss is expected on a power loss.

> At any rate,
> this does not happen with 2.6.28.

that I can't explain for sure.... different timing perhaps.

> I dumped the file with xdump a util I use internally for my own use so
> you could see the file contents as text and I could post it here.
> This was an image file but look what ended up in it -- directory
> blocks and such. Take a look:

As I said, stale blocks exposed due to data=writeback. Known behavior,
unfortunately the default for ext3. If you find similar problems
when mounted data=ordered, it's a more interesting report.

-Eric

> 0 1 2 3 4 5 6 7 8 9 A B C D E F
> 00000000 6C 73 0A 63 64 20 2E 2E 0A 63 6C 73 0A 6C 73 0A ls.cd ...cls.ls.
> 00000010 63 64 20 6C 69 6E 75 78 2D 32 2E 36 2E 33 34 2D cd linux-2.6.34-
> 00000020 6D 64 62 2F 0A 63 6C 73 0A 6C 73 0A 63 64 20 2E mdb/.cls.ls.cd .
> 00000030 2E 0A 63 6C 73 0A 6C 73 0A 63 64 20 6C 69 6E 75 ..cls.ls.cd linu

<giant snip>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bill Davidsen on
Jeffrey Merkey wrote:
> ---------- Forwarded message ----------
> From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
> Date: Mon, Jun 7, 2010 at 7:55 PM
> Subject: Re: EXT3 File System Corruption 2.6.34
> To: Eric Sandeen <sandeen(a)sandeen.net>
>
>
>> On Jun 7, 2010, at 6:55 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote:
>>
>>> ---------- Forwarded message ----------
>>> From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
>>> Date: Mon, Jun 7, 2010 at 5:54 PM
>>> Subject: Re: EXT3 File System Corruption 2.6.34
>>> To: Eric Sandeen <sandeen(a)sandeen.net>
>>>
>>>
>>> REPLY TO ALL
>>>
>>> CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
>>>
>>> Whether set this way or not, should not see corruption.
>> Here you are mistaken. Mount with data=ordered and see. Writeback can
>> expose stale data.
>>
>> -Eric
>>
>
> OK. I will set this up. You may want to make this option the default
> in the build scripts. here is a corrupted file. This was a .gif
> image file I saved THEN AFTER SAVING THE FILE I pulled the power to
> the machine and during recovery the file was FUCKED. At any rate,
> this does not happen with 2.6.28.
>
Having bad things happen when power is removed is not much of a surprise, and
various options can fix that at the cost of speed. The fact that this didn't
happen with 2.6.28 is bothersome.

I actually take some care to avoid testing behavior in this area, not my normal
intended mode of operation.

--
Bill Davidsen <davidsen(a)tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeffrey Merkey on
Well, I set the system to the default ordered mode and the problem
went away. EXT3 recovers nicely now. I run across this all the time
since I develop high speed kernel stuff and have a lot of cases where
a bug crashes the system. This time it showed up while developing the
MDB debugger with the hw_breakpoint interface which caused the system
to crash until I figured out this newer interface had hooked the
notify_die handlers and was trapping breakpoints which caused a lot of
hangs until I fixed it, so it is something I ran across coincidently.
The default ordered mode makes ext3 robust again.

Jeff


On Thu, Jun 10, 2010 at 3:04 PM, Bill Davidsen <davidsen(a)tmr.com> wrote:
> Jeffrey Merkey wrote:
>>
>> ---------- Forwarded message ----------
>> From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
>> Date: Mon, Jun 7, 2010 at 7:55 PM
>> Subject: Re: EXT3 File System Corruption 2.6.34
>> To: Eric Sandeen <sandeen(a)sandeen.net>
>>
>>
>>> On Jun 7, 2010, at 6:55 PM, Jeffrey Merkey <jeffmerkey(a)gmail.com> wrote:
>>>
>>>> ---------- Forwarded message ----------
>>>> From: Jeffrey Merkey <jeffmerkey(a)gmail.com>
>>>> Date: Mon, Jun 7, 2010 at 5:54 PM
>>>> Subject: Re: EXT3 File System Corruption 2.6.34
>>>> To: Eric Sandeen <sandeen(a)sandeen.net>
>>>>
>>>>
>>>> REPLY TO ALL
>>>>
>>>> CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
>>>>
>>>> Whether set this way or not, should not see corruption.
>>>
>>> Here you are mistaken. �Mount with data=ordered and see. �Writeback can
>>> expose stale data.
>>>
>>> -Eric
>>>
>>
>> OK. �I will set this up. �You may want to make this option the default
>> in the build scripts. �here is a corrupted file. �This was a .gif
>> image file I saved THEN AFTER SAVING THE FILE I pulled the power to
>> the machine and during recovery the file was FUCKED. �At any rate,
>> this does not happen with 2.6.28.
>>
> Having bad things happen when power is removed is not much of a surprise,
> and various options can fix that at the cost of speed. The fact that this
> didn't happen with 2.6.28 is bothersome.
>
> I actually take some care to avoid testing behavior in this area, not my
> normal intended mode of operation.
>
> --
> Bill Davidsen <davidsen(a)tmr.com>
> �"We have more to fear from the bungling of the incompetent than from
> the machinations of the wicked." �- from Slashdot
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/