Is it supposed to be ok to call del_gendisk while userspace is frozen? [Kernel]

Prev: power_supply: bq27x00: remove double endian swap
Next: power_supply: bq27x00: fix temperature conversion

From: Alan Stern on 24 Feb 2010 15:20

On Wed, 24 Feb 2010, Jens Axboe wrote:

> > How about plugging the request queue instead of freezing the writeback
> > task? Would that work? It should be easy enough for a driver to
> > unplug the queue before unregistering its device from within a resume
> > method.
>
> We have specific methods for either freezing of stopping or starting the
> queue, perhaps those would be appropriate for suspend/resume actions. It
> effectively prevents the queueing function from being called. If there
> are dirty pages for the device, then it would not help though, as you
> would still get stuck waiting for that IO to complete.

If the resume method would restart the queue before unregistering the
device, pending dirty pages wouldn't cause any problems. They'd get
sent down to the driver and rejected immediately because the device was
dead or done.

The difficulty with this approach is that it requires individual
attention for each block device driver. Either the driver has to
freeze/stop/plug the queue during suspend (and restart it during
resume) or else the device's writeback task has to be frozen.

Can this be encapsulated by a function in the block layer? For
example, drivers could call blk_set_hot_unpluggable(bdev) for devices
that might need to be unregistered during resume. Then they would
become responsible for managing the device's queue.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 1 Mar 2010 10:30

On Mon, 1 Mar 2010, Pavel Machek wrote:

> Hi!
>
> > > > This is a matter for Jens. Is the bdi writeback task freezable? If it
> > > > is, should it be made unfreezable?
> > >
> > > I'm not a big expect on what tasks should be freezable or not. As it
> > > stands, the writeback tasks will attempt to freeze and thaw with the
> > > system. I guess that screws the sync from resume call, since it's not
> > > running and the sync will wait for it to retrieve and finish that work
> > > item.
> > >
> > > To the suspend experts - can we safely mark the writeback tasks as
> > > non-freezable?
> >
> > The reason for freezing those tasks is to avoid writebacks at random
> > times during a system sleep transition, when the underlying device may
> > already be suspended, right?
>
> It is also there to avoid inconsistency between in-filesystem data and
> snapshot in hibernation image.

A good point, although in this case I think it won't matter. Writing
out a dirty page twice (once right after taking the snapshot and then
again after resuming from hibernation) will leave the disk in a correct
state.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 3 Mar 2010 17:30

On Wed, 3 Mar 2010, Pavel Machek wrote:

> Hi!
>
> > > > The reason for freezing those tasks is to avoid writebacks at random
> > > > times during a system sleep transition, when the underlying device may
> > > > already be suspended, right?
> > >
> > > It is also there to avoid inconsistency between in-filesystem data and
> > > snapshot in hibernation image.
> >
> > A good point, although in this case I think it won't matter. Writing
> > out a dirty page twice (once right after taking the snapshot and then
> > again after resuming from hibernation) will leave the disk in a correct
> > state.
>
> No, I don't think so. Have you considered all the various journalling
> systems?
>
> Definitely not in presence of I/O errors. Commit block can only be
> written after previous blocks are successfully writen to the journal.
>
> So lets see:
>
> <snapshot>
>
> Write previous block, write commit block, write more blocks
>
> <hibernation powerdown, restart>
>
> Error writing previous block (block now contains garbage), leading to
> kernel panic
>
> <restart>
>
> journalling assumptions broken: commit block is there, but previous
> blocks are not intact. Data loss.
>
> ...and that was the first I could think about. Lets not do
> this. Barriers were invented for a reason.

Very well. Then we still need a solution to the original problem:
Devices sometimes need to be unregistered during resume, but
del_gendisk() blocks on the writeback thread, which is frozen until
after the resume finishes. How do you suggest this be fixed?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 3 Mar 2010 21:50

On Thu, 4 Mar 2010, Rafael J. Wysocki wrote:

> > Very well. Then we still need a solution to the original problem:
> > Devices sometimes need to be unregistered during resume, but
> > del_gendisk() blocks on the writeback thread, which is frozen until
> > after the resume finishes. How do you suggest this be fixed?
>
> I thought about thawing the writeback thread earlier in such cases.
>
> Would that makes sense / is it doable at all?

My thought exactly. This is the only approach that also solves the
following race:

A driver is unloaded at the same time as a suspend starts.

The writeback thread gets frozen.

Then before the rmmod thread is frozen, it calls del_gendisk.

Delaying things by means of a workqueue (or the equivalent) might also
work, but it doesn't seem as safe. For example, some important
writebacks might end up getting delayed until too late.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Alan Stern on 4 Mar 2010 14:40

On Thu, 4 Mar 2010, Rafael J. Wysocki wrote:

> On Thursday 04 March 2010, Alan Stern wrote:
> > On Thu, 4 Mar 2010, Rafael J. Wysocki wrote:
> >
> > > > Very well. Then we still need a solution to the original problem:
> > > > Devices sometimes need to be unregistered during resume, but
> > > > del_gendisk() blocks on the writeback thread, which is frozen until
> > > > after the resume finishes. How do you suggest this be fixed?
> > >
> > > I thought about thawing the writeback thread earlier in such cases.
> > >
> > > Would that makes sense / is it doable at all?
> >
> > My thought exactly. This is the only approach that also solves the
> > following race:
> >
> > A driver is unloaded at the same time as a suspend starts.
> >
> > The writeback thread gets frozen.
> >
> > Then before the rmmod thread is frozen, it calls del_gendisk.
> >
> > Delaying things by means of a workqueue (or the equivalent) might also
> > work, but it doesn't seem as safe. For example, some important
> > writebacks might end up getting delayed until too late.
>
> OK, so what exactly should trigger the thawing of the writeback thread?
>
> Should we do that unconditionally at one point or wait for a specific situation
> to happen, and how to recognize that situation in the latter case?

My thought was that del_gendisk would do this unconditionally. It
would make the task unfreezable and then thaw it, to prevent a race.

This assumes that del_gendisk never gets called in the middle of a
sleep transition unless the device is really gone. If that's not the
case, Pavel's suggestion (make the resume routine do the device removal
asynchronously) might be better.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: power_supply: bq27x00: remove double endian swap
Next: power_supply: bq27x00: fix temperature conversion