From: Paul E. McKenney on
On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 14 Sep 2009, Ingo Molnar wrote:
> >
> > BUG kmalloc-64: Poison overwritten
> > -----------------------------------------------------------------------------
> >
> > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> >
> > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
>
> That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> they're both valid kernel pointers, and the 0xf498f398 one is actually
> into the same page as the corruption, so it's a pointer to the same slab
> type (or at least same size). Which is a good hint in itself: we're
> looking at a list or something.
>
> And it's at offset 16 in the structure.
>
> That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> the same page) is 'next', and the second thing is a pointer to kernel text
> (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
>
> So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> Both of them are new or hugely changed since 2.6.31.

If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
CONFIG_TREE_RCU that it actually had used, I would suggest applying the
patchset I submitted yesterday (Sept 13).

http://thread.gmane.org/gmane.linux.kernel/888803

Will take a look, regardless.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Mon, Sep 14 2009, Paul E. McKenney wrote:
> On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> >
> >
> > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > >
> > > BUG kmalloc-64: Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > >
> > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> >
> > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > into the same page as the corruption, so it's a pointer to the same slab
> > type (or at least same size). Which is a good hint in itself: we're
> > looking at a list or something.
> >
> > And it's at offset 16 in the structure.
> >
> > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > the same page) is 'next', and the second thing is a pointer to kernel text
> > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> >
> > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > Both of them are new or hugely changed since 2.6.31.
>
> If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> CONFIG_TREE_RCU that it actually had used, I would suggest applying the
> patchset I submitted yesterday (Sept 13).
>
> http://thread.gmane.org/gmane.linux.kernel/888803

Ingo, did it? I'll dive into this tonight, Linus' analysis and just a
general feel does point in the direction of the bdi work.

> Will take a look, regardless.

Thanks!

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Jens Axboe <jens.axboe(a)oracle.com> wrote:

> On Mon, Sep 14 2009, Paul E. McKenney wrote:
> > On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > > >
> > > > BUG kmalloc-64: Poison overwritten
> > > > -----------------------------------------------------------------------------
> > > >
> > > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > > >
> > > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> > >
> > > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > > into the same page as the corruption, so it's a pointer to the same slab
> > > type (or at least same size). Which is a good hint in itself: we're
> > > looking at a list or something.
> > >
> > > And it's at offset 16 in the structure.
> > >
> > > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > > the same page) is 'next', and the second thing is a pointer to kernel text
> > > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> > >
> > > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > > Both of them are new or hugely changed since 2.6.31.
> >
> > If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> > CONFIG_TREE_RCU that it actually had used, I would suggest applying
> > the patchset I submitted yesterday (Sept 13).
> >
> > http://thread.gmane.org/gmane.linux.kernel/888803
>
> Ingo, did it? [...]

The config i attached to the bugreport has:

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_TREE_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_EXACT=y
CONFIG_TREE_RCU_TRACE=y

So TREE_PREEMPT_RCU & the synchronize_rcu() bug Paul fixed is out.

> [...] I'll dive into this tonight, Linus' analysis and just a general
> feel does point in the direction of the bdi work.

Hard to tell whether it's BDI, RCU or something else - sadly this is the
only incident i've managed to log so far. (We'd be all much happier if
boxes crashed left and right! ;)

-tip's been carrying the RCU changes for a long(er) time which would
reduce the chance of this being RCU related. [ It's still possible
though: if it's a bug with a probability of hitting this box on these
workloads with a chance of 1:20,000 or worse. ]

Plus it triggered shortly after i updated -tip to latest -git which had
the BDI bits - which would indicate the BDI stuff - or just about
anything else in -git for that matter - or something older in -tip.
Every day without having hit this crash once more broadens the range of
plausible possibilities.

In any case, i'll refrain from trying to fit a line on a single point of
measurement ;-)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Tue, Sep 15 2009, Ingo Molnar wrote:
>
> * Jens Axboe <jens.axboe(a)oracle.com> wrote:
>
> > On Mon, Sep 14 2009, Paul E. McKenney wrote:
> > > On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> > > >
> > > >
> > > > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > > > >
> > > > > BUG kmalloc-64: Poison overwritten
> > > > > -----------------------------------------------------------------------------
> > > > >
> > > > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > > > >
> > > > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> > > >
> > > > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > > > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > > > into the same page as the corruption, so it's a pointer to the same slab
> > > > type (or at least same size). Which is a good hint in itself: we're
> > > > looking at a list or something.
> > > >
> > > > And it's at offset 16 in the structure.
> > > >
> > > > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > > > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > > > the same page) is 'next', and the second thing is a pointer to kernel text
> > > > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> > > >
> > > > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > > > Both of them are new or hugely changed since 2.6.31.
> > >
> > > If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> > > CONFIG_TREE_RCU that it actually had used, I would suggest applying
> > > the patchset I submitted yesterday (Sept 13).
> > >
> > > http://thread.gmane.org/gmane.linux.kernel/888803
> >
> > Ingo, did it? [...]
>
> The config i attached to the bugreport has:
>
> #
> # RCU Subsystem
> #
> CONFIG_TREE_RCU=y
> # CONFIG_TREE_PREEMPT_RCU is not set
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=64
> CONFIG_RCU_FANOUT_EXACT=y
> CONFIG_TREE_RCU_TRACE=y
>
> So TREE_PREEMPT_RCU & the synchronize_rcu() bug Paul fixed is out.

Yeah, I noticed later on. synchronize_rcu() is only used on exit as
well, so if it happened during boot it would have to be a call_rcu()
problem.

> > [...] I'll dive into this tonight, Linus' analysis and just a general
> > feel does point in the direction of the bdi work.
>
> Hard to tell whether it's BDI, RCU or something else - sadly this is the
> only incident i've managed to log so far. (We'd be all much happier if
> boxes crashed left and right! ;)

Indeed, that's much easier to test and fix!

> -tip's been carrying the RCU changes for a long(er) time which would
> reduce the chance of this being RCU related. [ It's still possible
> though: if it's a bug with a probability of hitting this box on these
> workloads with a chance of 1:20,000 or worse. ]
>
> Plus it triggered shortly after i updated -tip to latest -git which had
> the BDI bits - which would indicate the BDI stuff - or just about
> anything else in -git for that matter - or something older in -tip.
> Every day without having hit this crash once more broadens the range of
> plausible possibilities.

I haven't found anything here yet, but I'll keep playing. My RCU config
is the same as yours.

> In any case, i'll refrain from trying to fit a line on a single point of
> measurement ;-)

;-)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Tue, Sep 15 2009, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo(a)elte.hu> wrote:
>
> > Hard to tell whether it's BDI, RCU or something else - sadly this is
> > the only incident i've managed to log so far. (We'd be all much
> > happier if boxes crashed left and right! ;)
> >
> > -tip's been carrying the RCU changes for a long(er) time which would
> > reduce the chance of this being RCU related. [ It's still possible
> > though: if it's a bug with a probability of hitting this box on these
> > workloads with a chance of 1:20,000 or worse. ]
> >
> > Plus it triggered shortly after i updated -tip to latest -git which
> > had the BDI bits - which would indicate the BDI stuff - or just about
> > anything else in -git for that matter - or something older in -tip.
> > Every day without having hit this crash once more broadens the range
> > of plausible possibilities.
> >
> > In any case, i'll refrain from trying to fit a line on a single point
> > of measurement ;-)
>
> Ha! I should have checked all logs of today before writing that, not
> just that box's logs.
>
> Another testbox triggered the SLAB corruption yesternight:
>
> [ 13.598011] Freeing unused kernel memory: 2820k freed
> [ 13.602011] Write protecting the kernel read-only data: 13528k
> [ 13.649011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
> [ 14.391012] =============================================================================
> [ 14.391012] BUG kmalloc-96: Poison overwritten
> [ 14.391012] -----------------------------------------------------------------------------
> [ 14.391012]
> [ 14.391012] INFO: 0xffff88003da4a950-0xffff88003da4a988. First byte 0x0 instead of 0x6b
> [ 14.391012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=6 cpu=0 pid=3191
> [ 14.391012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
> [ 14.391012] INFO: Slab 0xffffea000190ae10 objects=24 used=13 fp=0xffff88003da4a930 flags=0x200000000000c3
> [ 14.391012] INFO: Object 0xffff88003da4a930 @offset=2352 fp=0xffff88003da4a888
> [ 14.391012]
> [ 14.391012] Bytes b4 0xffff88003da4a920: 52 a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a R���....ZZZZZZZZ
> [ 14.391012] Object 0xffff88003da4a930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a950: 00 0c 47 3d 00 88 ff ff be 22 11 81 ff ff ff ff ..G=..���"..����
> [ 14.391012] Object 0xffff88003da4a960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a970: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a980: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk�
> [ 14.391012] Redzone 0xffff88003da4a990: bb bb bb bb bb bb bb bb ��������
> [ 14.391012] Padding 0xffff88003da4a9d0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> [ 14.391012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02377-g78907f0-dirty #88876
> [ 14.391012] Call Trace:
> [ 14.391012] [<ffffffff810e6bef>] print_trailer+0x140/0x149
> [ 14.391012] [<ffffffff810e70db>] check_bytes_and_report+0xb7/0xf7
> [ 14.391012] [<ffffffff810e71ec>] check_object+0xd1/0x1b4
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff810e7bee>] alloc_debug_processing+0x7b/0xf7
> [ 14.391012] [<ffffffff810e9842>] __slab_alloc+0x23e/0x282
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff810e9a8f>] kmem_cache_alloc+0xa1/0x13f
> [ 14.391012] [<ffffffff81112078>] bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff81112ae2>] bdi_writeback_all+0x66/0x133
> [ 14.391012] [<ffffffff810788b8>] ? mark_held_locks+0x4d/0x6b
> [ 14.391012] [<ffffffff81872268>] ? __mutex_unlock_slowpath+0x12d/0x163
> [ 14.391012] [<ffffffff81078b45>] ? trace_hardirqs_on_caller+0x11c/0x140
> [ 14.391012] [<ffffffff815c35e5>] ? usbfs_fill_super+0x0/0xa8
> [ 14.391012] [<ffffffff81112c87>] writeback_inodes_sb+0x75/0x83
> [ 14.391012] [<ffffffff8111652d>] __sync_filesystem+0x30/0x6b
> [ 14.391012] [<ffffffff81116705>] sync_filesystem+0x3a/0x51
> [ 14.391012] [<ffffffff810f4b23>] do_remount_sb+0x5b/0x11f
> [ 14.391012] [<ffffffff810f59a3>] get_sb_single+0x92/0xad
> [ 14.391012] [<ffffffff815c2f01>] usb_get_sb+0x1b/0x1d
> [ 14.391012] [<ffffffff810f5786>] vfs_kern_mount+0x9e/0x122
> [ 14.391012] [<ffffffff810f5871>] do_kern_mount+0x4c/0xec
> [ 14.391012] [<ffffffff8110dd15>] do_mount+0x1e9/0x236
> [ 14.391012] [<ffffffff8110dde6>] sys_mount+0x84/0xc6
> [ 14.391012] [<ffffffff8187333a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 14.391012] [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
> [ 14.391012] FIX kmalloc-96: Restoring 0xffff88003da4a950-0xffff88003da4a988=0x6b
> [ 14.391012]
> [ 14.391012] FIX kmalloc-96: Marking all objects used
> [ 20.597016] usb usb2: uevent
> [ 20.606016] usb 2-0:1.0: uevent
> [ 20.615016] usb usb3: uevent
> [ 20.622016] usb 3-0:1.0: uevent
> [ 20.631016] usb usb4: uevent
>
> Different hardware, different config, but still in bdi_alloc_work().
>
> - Which excludes cosmic rays and freak hardware from the list of
> possibilities.
>
> - I'd also say RCU is out too as this incident was 500 iterations after
> the BDI merge - preceded by a streak of 3000+ successful iterations on
> that same box with all of -tip (including the RCU changes).
>
> - Random memory corruption is probably out as well - the chance of
> hitting a BDI data structure twice accidentally is low.
>
> - It's also two completely different versions of distros - the
> user-space of the two testboxes affected is 2 years apart or so.
>
> - It's a single CPU box - SMP races are out as well.
>
> This points towards this being a BDI bug with about ~80% confidence
> statistically - or [with a lower probability] a SLAB bug. (both failing
> configs had CONFIG_SLUB=y. But SLUB is the best in detecting corrupted
> data structures so that alone does not tell much.)

Hmmm, at least this reproduces more consistently. Can I talk you into
trying to pull in:

git://git.kernel.dk/linux-2.6-block.git writeback

and see if it reproduces there? That path has been cleaned up
considerably there.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/