From: Alan Stern on
On Mon, 15 Feb 2010, Rafael J. Wysocki wrote:

> On Monday 15 February 2010, Maxim Levitsky wrote:
> > On Sat, 2010-02-13 at 15:29 +0200, Maxim Levitsky wrote:
> > > I noticed that currently calling del_gendisk leads to sure deadlock if
> > > attemped from .suspend or .resume functions.
>
> Well, it shouldn't be called from there, then.

Even if drivers avoid calling it from within suspend methods, they have
to be able to call it from within resume methods. After all, the
resume method may find that the disk's device has vanished.

> > > Something like that:
> > >
> > > [<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
> > > [<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
> > > [<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > > [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > [<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
> > > [<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
> > > [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > [<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
> > > [<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
> > > [<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
> > > [<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
> > > [<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
> > > [<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
> > > [<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
> > > [<ffffffff811391de>] fsync_bdev+0x2e/0x60
> > > [<ffffffff812226be>] invalidate_partition+0x2e/0x50
> > > [<ffffffff8116b92f>] del_gendisk+0x3f/0x140
> > > [<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
> > > [<ffffffff81338977>] mmc_bus_remove+0x17/0x20
> > > [<ffffffff812ce746>] __device_release_driver+0x66/0xc0
> > > [<ffffffff812ce89d>] device_release_driver+0x2d/0x40
> > > [<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
> > > [<ffffffff812cb46f>] device_del+0x12f/0x1a0
> > > [<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
> > > [<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
> > > [<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
> > > [<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
> > > [<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
> > >
> > > bdi_queue_work seems to be the problem.
> > >
> > > Some device drivers need to remove their cards logically in .suspend,
> > > because the card is removable, and can be changed while system is
> > > suspended.
>
> I don't know how to resolve this right now.

This is a matter for Jens. Is the bdi writeback task freezable? If it
is, should it be made unfreezable?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Maxim Levitsky on
On Tue, 2010-02-16 at 11:27 -0500, Alan Stern wrote:
> On Mon, 15 Feb 2010, Rafael J. Wysocki wrote:
>
> > On Monday 15 February 2010, Maxim Levitsky wrote:
> > > On Sat, 2010-02-13 at 15:29 +0200, Maxim Levitsky wrote:
> > > > I noticed that currently calling del_gendisk leads to sure deadlock if
> > > > attemped from .suspend or .resume functions.
> >
> > Well, it shouldn't be called from there, then.
>
> Even if drivers avoid calling it from within suspend methods, they have
> to be able to call it from within resume methods. After all, the
> resume method may find that the disk's device has vanished.
>
> > > > Something like that:
> > > >
> > > > [<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
> > > > [<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
> > > > [<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > > > [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > [<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
> > > > [<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
> > > > [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > [<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
> > > > [<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
> > > > [<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
> > > > [<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
> > > > [<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
> > > > [<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
> > > > [<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
> > > > [<ffffffff811391de>] fsync_bdev+0x2e/0x60
> > > > [<ffffffff812226be>] invalidate_partition+0x2e/0x50
> > > > [<ffffffff8116b92f>] del_gendisk+0x3f/0x140
> > > > [<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
> > > > [<ffffffff81338977>] mmc_bus_remove+0x17/0x20
> > > > [<ffffffff812ce746>] __device_release_driver+0x66/0xc0
> > > > [<ffffffff812ce89d>] device_release_driver+0x2d/0x40
> > > > [<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
> > > > [<ffffffff812cb46f>] device_del+0x12f/0x1a0
> > > > [<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
> > > > [<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
> > > > [<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
> > > > [<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
> > > > [<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
> > > >
> > > > bdi_queue_work seems to be the problem.
> > > >
> > > > Some device drivers need to remove their cards logically in .suspend,
> > > > because the card is removable, and can be changed while system is
> > > > suspended.
> >
> > I don't know how to resolve this right now.
>
> This is a matter for Jens. Is the bdi writeback task freezable? If it
> is, should it be made unfreezable?

Any update?

Best regards,
Maxim Levitsky



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 23 Feb 2010, Jens Axboe wrote:

> On Tue, Feb 16 2010, Alan Stern wrote:
> > On Mon, 15 Feb 2010, Rafael J. Wysocki wrote:
> >
> > > On Monday 15 February 2010, Maxim Levitsky wrote:
> > > > On Sat, 2010-02-13 at 15:29 +0200, Maxim Levitsky wrote:
> > > > > I noticed that currently calling del_gendisk leads to sure deadlock if
> > > > > attemped from .suspend or .resume functions.
> > >
> > > Well, it shouldn't be called from there, then.
> >
> > Even if drivers avoid calling it from within suspend methods, they have
> > to be able to call it from within resume methods. After all, the
> > resume method may find that the disk's device has vanished.
>
> del_gendisk() needs process context at least, since it'll sleep (not
> just for sync/invalidate, but other parts of the destruction as well).

That's not a problem; suspend and resume run in process context.

> > This is a matter for Jens. Is the bdi writeback task freezable? If it
> > is, should it be made unfreezable?
>
> I'm not a big expect on what tasks should be freezable or not. As it
> stands, the writeback tasks will attempt to freeze and thaw with the
> system. I guess that screws the sync from resume call, since it's not
> running and the sync will wait for it to retrieve and finish that work
> item.
>
> To the suspend experts - can we safely mark the writeback tasks as
> non-freezable?

The reason for freezing those tasks is to avoid writebacks at random
times during a system sleep transition, when the underlying device may
already be suspended, right?

In principle, a device's writeback task could be unfrozen immediately
after the device is resumed. In practice this might not solve the
problem, since the del_gendisk() call occurs _within_ the device's
resume routine. I suppose del_gendisk() could be made responsible for
unfreezing the writeback task.

The best solution would be to have del_gendisk() avoid waiting for the
writeback task in cases where the underlying device has been removed.
I don't know if that is feasible, however.

Alan Stern

P.S.: Jens, given a pointer to a struct gendisk or to a struct
request_queue, is there a good way to tell whether there any dirty
buffers for that device waiting to be written out? This is for
purposes of runtime power management -- in the initial implementation,
I want to avoid powering-down a block device if it is open or has any
dirty buffers. In other words, only completely idle devices should be
powered down (a good example would be a card reader with no memory card
inserted).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 23 Feb 2010, Jens Axboe wrote:

> On Tue, Feb 23 2010, Alan Stern wrote:
> > > > This is a matter for Jens. Is the bdi writeback task freezable? If it
> > > > is, should it be made unfreezable?
> > >
> > > I'm not a big expect on what tasks should be freezable or not. As it
> > > stands, the writeback tasks will attempt to freeze and thaw with the
> > > system. I guess that screws the sync from resume call, since it's not
> > > running and the sync will wait for it to retrieve and finish that work
> > > item.
> > >
> > > To the suspend experts - can we safely mark the writeback tasks as
> > > non-freezable?
> >
> > The reason for freezing those tasks is to avoid writebacks at random
> > times during a system sleep transition, when the underlying device may
> > already be suspended, right?
>
> Right, or at least it would seem pointless to have them running while
> the device is suspended. But my point was that if it's easier (and
> feasible) to just leave them running, perhaps that was easier.

I don't have a clear picture of how the block layer operates. For
example, what is the reason for this comment in the definition of
struct genhd?

struct device *driverfs_dev; // FIXME: remove

Isn't that crucial for making a disk show up in sysfs? Is the comment
out of date?

A possible approach is to add suspend and resume methods for this
driverfs_dev, and make them be responsible for stopping and restarting
the writeback task instead of relying on the freezer. Then
del_gendisk() could cleanly restart the task when necessary.

> > In principle, a device's writeback task could be unfrozen immediately
> > after the device is resumed. In practice this might not solve the
> > problem, since the del_gendisk() call occurs _within_ the device's
> > resume routine. I suppose del_gendisk() could be made responsible for
> > unfreezing the writeback task.
>
> And that's back to the question of whether or not that is a nice thing to
> do. It seems a bit dirty, but otoh where else to do it. Perhaps just
> using the kblockd to postpone the del_gendisk() to out-of-resume context
> would be the best approach.

That would involve a layering violation, wouldn't it? Either the
driver would have to interface with kblockd directly, or else
del_gendisk() would need to know whether the writeback task was frozen.

On the whole, I think it's best for the block layer to retain full
control over its own tasks and requirements.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Tue, 23 Feb 2010, Jens Axboe wrote:

> > > And that's back to the question of whether or not that is a nice thing to
> > > do. It seems a bit dirty, but otoh where else to do it. Perhaps just
> > > using the kblockd to postpone the del_gendisk() to out-of-resume context
> > > would be the best approach.
> >
> > That would involve a layering violation, wouldn't it? Either the
> > driver would have to interface with kblockd directly, or else
> > del_gendisk() would need to know whether the writeback task was frozen.
> >
> > On the whole, I think it's best for the block layer to retain full
> > control over its own tasks and requirements.
>
> You would export such functionality - del_gendisk_deferred(), or
> something like that. The kblockd suggestion was implementation detail,
> not something the driver would concern itself with. It's not exactly
> picture perfect, but it could be used from eg resume context where the
> device isn't fully live yet.

Hmm. There's still no way for the driver to know whether or not the
writeback task is frozen when it wants to call del_gendisk(). It
would have to defer _all_ such calls. And all hot-pluggable block
drivers would have to do this -- would that be acceptable?

How about plugging the request queue instead of freezing the writeback
task? Would that work? It should be easy enough for a driver to
unplug the queue before unregistering its device from within a resume
method.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/