From: Eric Miao on
On Feb 6, 4:00�am, Maxim Levitsky <maximlevit...(a)gmail.com> wrote:
> On Fri, 2010-02-05 at 10:26 -0800, Andrew Morton wrote:
> > On Fri, 05 Feb 2010 17:52:00 +0200
> > Maxim Levitsky <maximlevit...(a)gmail.com> wrote:
>
> > > > > <4>[15241.042047] �[<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
> > > > > <4>[15241.042159] �[<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
> > > > > <4>[15241.042271] �[<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > > > > <4>[15241.042386] �[<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042496] �[<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
> > > > > <4>[15241.042606] �[<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
> > > > > <4>[15241.042714] �[<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042824] �[<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
> > > > > <4>[15241.042935] �[<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
> > > > > <4>[15241.043045] �[<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
> > > > > <4>[15241.043155] �[<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
> > > > > <4>[15241.043265] �[<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
> > > > > <4>[15241.043375] �[<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
> > > > > <4>[15241.043485] �[<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
> > > > > <4>[15241.043594] �[<ffffffff811391de>] fsync_bdev+0x2e/0x60
> > > > > <4>[15241.043704] �[<ffffffff812226be>] invalidate_partition+0x2e/0x50
> > > > > <4>[15241.043816] �[<ffffffff8116b92f>] del_gendisk+0x3f/0x140
> > > > > <4>[15241.043926] �[<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
> > > > > <4>[15241.044043] �[<ffffffff81338977>] mmc_bus_remove+0x17/0x20
> > > > > <4>[15241.044152] �[<ffffffff812ce746>] __device_release_driver+0x66/0xc0
> > > > > <4>[15241.044264] �[<ffffffff812ce89d>] device_release_driver+0x2d/0x40
> > > > > <4>[15241.044375] �[<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
> > > > > <4>[15241.044486] �[<ffffffff812cb46f>] device_del+0x12f/0x1a0
> > > > > <4>[15241.044593] �[<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
> > > > > <4>[15241.044702] �[<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
> > > > > <4>[15241.044811] �[<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
> > > > > <4>[15241.044929] �[<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
> > > > > <4>[15241.045044] �[<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
>
> > > > So what's the hang? �del_gendisk is doing IO? �I'd assumed that it was
> > > > because it was calling kobject_uevent, but userspace is frozen.
>
> > > This is a backtrace of a hang.
>
> > But why did it hang? �Because the BDI worker threads are trying to
> > perform IO through a suspended device?
>
> Something like that I guess.
> Also this is 100% reproducible, and I can reproduce this with my own
> driver too (by making the card detection workqueue be non freezable)
>

It looks to me bdi is waiting for writeback task to finish, yet the
processes
are frozen, so this never happens, and hang.

And I can confirm this always happens. Without MMC_UNSAFE_RESUME,
this happens when suspending where the mmc core tries to remove the
card.
With MMC_UNSAFE_RESUME, this happens when resume if the card removed
during suspend.

Though the root cause looks to me lies in the del_gendisk() not safe
to be
called within suspend context, and a clean fix might be somewhere in
the
generic disk layer. Skip removing card during suspend, IMHO, might not
be
a clean enough fix to this problem.

I might be able to avoid this issue by removing the card within user
space
pm scripts, but that's a shame if this cannot be cleanly fixed within
kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Miao on
On Feb 6, 4:00�am, Maxim Levitsky <maximlevit...(a)gmail.com> wrote:
> On Fri, 2010-02-05 at 10:26 -0800, Andrew Morton wrote:
> > On Fri, 05 Feb 2010 17:52:00 +0200
> > Maxim Levitsky <maximlevit...(a)gmail.com> wrote:
>
> > > > > <4>[15241.042047] �[<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
> > > > > <4>[15241.042159] �[<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
> > > > > <4>[15241.042271] �[<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > > > > <4>[15241.042386] �[<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042496] �[<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
> > > > > <4>[15241.042606] �[<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
> > > > > <4>[15241.042714] �[<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042824] �[<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
> > > > > <4>[15241.042935] �[<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
> > > > > <4>[15241.043045] �[<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
> > > > > <4>[15241.043155] �[<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
> > > > > <4>[15241.043265] �[<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
> > > > > <4>[15241.043375] �[<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
> > > > > <4>[15241.043485] �[<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
> > > > > <4>[15241.043594] �[<ffffffff811391de>] fsync_bdev+0x2e/0x60
> > > > > <4>[15241.043704] �[<ffffffff812226be>] invalidate_partition+0x2e/0x50
> > > > > <4>[15241.043816] �[<ffffffff8116b92f>] del_gendisk+0x3f/0x140
> > > > > <4>[15241.043926] �[<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
> > > > > <4>[15241.044043] �[<ffffffff81338977>] mmc_bus_remove+0x17/0x20
> > > > > <4>[15241.044152] �[<ffffffff812ce746>] __device_release_driver+0x66/0xc0
> > > > > <4>[15241.044264] �[<ffffffff812ce89d>] device_release_driver+0x2d/0x40
> > > > > <4>[15241.044375] �[<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
> > > > > <4>[15241.044486] �[<ffffffff812cb46f>] device_del+0x12f/0x1a0
> > > > > <4>[15241.044593] �[<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
> > > > > <4>[15241.044702] �[<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
> > > > > <4>[15241.044811] �[<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
> > > > > <4>[15241.044929] �[<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
> > > > > <4>[15241.045044] �[<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
>
> > > > So what's the hang? �del_gendisk is doing IO? �I'd assumed that it was
> > > > because it was calling kobject_uevent, but userspace is frozen.
>
> > > This is a backtrace of a hang.
>
> > But why did it hang? �Because the BDI worker threads are trying to
> > perform IO through a suspended device?
>
> Something like that I guess.
> Also this is 100% reproducible, and I can reproduce this with my own
> driver too (by making the card detection workqueue be non freezable)
>

It looks to me bdi is waiting for writeback task to finish, yet the
processes
are frozen, so this never happens, and hang.

And I can confirm this always happens. Without MMC_UNSAFE_RESUME,
this happens when suspending where the mmc core tries to remove the
card.
With MMC_UNSAFE_RESUME, this happens when resume if the card removed
during suspend.

Though the root cause looks to me lies in the del_gendisk() not safe
to be
called within suspend context, and a clean fix might be somewhere in
the
generic disk layer. Skip removing card during suspend, IMHO, might not
be
a clean enough fix to this problem.

I might be able to avoid this issue by removing the card within user
space
pm scripts, but that's a shame if this cannot be cleanly fixed within
kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/