block: fix leaks associated with discard request payload [Kernel]

Prev: [PATCH] vhost: break out of polling loop on error
Next: block: defer the use of inline biovecs for discard requests

From: Christoph Hellwig on 27 Jun 2010 05:30

On Sun, Jun 27, 2010 at 05:49:29PM +0900, FUJITA Tomonori wrote:
> On Sat, 26 Jun 2010 15:56:50 -0400
> Mike Snitzer <snitzer(a)redhat.com> wrote:
>
> > Fix leaks introduced via "block: don't allocate a payload for discard
> > request" commit a1d949f5f44.
> >
> > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > discard request's payload directly in scsi_finish_command().
>
> Instead of adding another discard hack to scsi_finish_command(), how
> about converting discard to REQ_TYPE_FS request? discard is FS request
> from the perspective of the block layer. It also fixes a problem that
> discard isn't retried in the case of UNIT ATTENTION.
>
> I think that we can get more cleaner code if we handle discard as
> normal (fs) request in the block layer (and scsi-ml). We need more
> changes but this patch is the first step.

Making discard a REQ_TYPE_FS inside scsi (it already is before entering
sd_prep_fn) means we'll need to special case it all over the I/O
submission and completion path. Having the payload length not matching
the transfer length is something we don't expect for FS requests.

> index e16185b..9e15c46 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -20,6 +20,10 @@ static void blkdev_discard_end_io(struct bio *bio, int err)
> if (bio->bi_private)
> complete(bio->bi_private);
>
> + /* free the page that the lower layer allocated */
> + if (bio_page(bio))
> + __free_page(bio_page(bio));
> +

This is exactly what this patchkit gets rid off. Having a payload
page that the caller tracks (previously fully, with this patch only for
freeing) makes DM's life a lot harder. Remember we don't actually store
any payload in there before entering sd_prep_fn - it's just that the
scsi commands implementing discards need some payload - either a sector
sizes zero filled buffer for WRITE SAME, or an LBA/len encoding inside
the payload for UNMAP.

> - rq->cmd_type = REQ_TYPE_BLOCK_PC;
> + rq->cmd_type = REQ_TYPE_FS;

No need to set REQ_TYPE_FS here, it's already set up that way.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 27 Jun 2010 05:40

On Sat, Jun 26, 2010 at 03:56:50PM -0400, Mike Snitzer wrote:
> Fix leaks introduced via "block: don't allocate a payload for discard
> request" commit a1d949f5f44.
>
> sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> discard request's payload directly in scsi_finish_command().
>
> Also cleanup page allocated for discard payload in
> scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.

This looks good - I've tested an equivalent patch doing xfstests runs
for more than 24hours now without seeing leaks.

On the long run I'd prefer not having the knowledge of the payload
freeing inside the core scsi code, but that requires either calling
->done for block pc requests, or treating discards as fs requests all
the way, which will require much larger changes.

So for now:

Reviewed-by: Christoph Hellwig <hch(a)lst.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 27 Jun 2010 07:10

> How about this?

As I tried to explain before this utterly confuses the I/O completion
path. With the patch applied even a simple mkfs.xfs that issues discard
just hangs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 27 Jun 2010 11:40

On Sun, Jun 27, 2010 at 09:32:07PM +0900, FUJITA Tomonori wrote:
> On Sun, 27 Jun 2010 13:07:12 +0200
> Christoph Hellwig <hch(a)lst.de> wrote:
>
> > > How about this?
> >
> > As I tried to explain before this utterly confuses the I/O completion
> > path. With the patch applied even a simple mkfs.xfs that issues discard
> > just hangs.
>
> Wired. I just tried mkfs.xfs against scsi_debug with my block patches
> (I saw one discard command). Seemed that it worked fine.

Doesn't work for me against either a real SSD or WRITE SAME support in
qemu. But I think I didn't actually have your barrier patches applied,
so I'll try again with a clean tree.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 28 Jun 2010 04:00

On Sun, Jun 27, 2010 at 09:32:07PM +0900, FUJITA Tomonori wrote:
> On Sun, 27 Jun 2010 13:07:12 +0200
> Christoph Hellwig <hch(a)lst.de> wrote:
>
> > > How about this?
> >
> > As I tried to explain before this utterly confuses the I/O completion
> > path. With the patch applied even a simple mkfs.xfs that issues discard
> > just hangs.
>
> Wired. I just tried mkfs.xfs against scsi_debug with my block patches
> (I saw one discard command). Seemed that it worked fine.

I've tracked it down to the call to scsi_requeue_command in scsi_end_request.
When the command is marked BLOCK_PC we'll just get it back as such in
->prep_fn next time, but now it's reverting to the previous state.
While I see the problems with leaking ressources in that case I still
can't quite explain the hang I see.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: [PATCH] vhost: break out of polling loop on error
Next: block: defer the use of inline biovecs for discard requests