From: FUJITA Tomonori on
On Sun, 27 Jun 2010 11:26:52 +0200
Christoph Hellwig <hch(a)lst.de> wrote:

> On Sun, Jun 27, 2010 at 05:49:29PM +0900, FUJITA Tomonori wrote:
> > On Sat, 26 Jun 2010 15:56:50 -0400
> > Mike Snitzer <snitzer(a)redhat.com> wrote:
> >
> > > Fix leaks introduced via "block: don't allocate a payload for discard
> > > request" commit a1d949f5f44.
> > >
> > > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > > discard request's payload directly in scsi_finish_command().
> >
> > Instead of adding another discard hack to scsi_finish_command(), how
> > about converting discard to REQ_TYPE_FS request? discard is FS request
> > from the perspective of the block layer. It also fixes a problem that
> > discard isn't retried in the case of UNIT ATTENTION.
> >
> > I think that we can get more cleaner code if we handle discard as
> > normal (fs) request in the block layer (and scsi-ml). We need more
> > changes but this patch is the first step.
>
> Making discard a REQ_TYPE_FS inside scsi (it already is before entering
> sd_prep_fn) means we'll need to special case it all over the I/O
> submission and completion path. Having the payload length not matching

Hmm, my patch doesn't add any special case in scsi submission and
completion. sd_prep_fn already has a hack for discard to set
bi->bi_size to rq->__data_size so scsi can tell the block layer to
finish discard requests.

Adding another special case for discard to scsi_io_completion()
doesn't look good.

About the block layer, we already have special case for discard
everywhere (rq->cmd_flags & REQ_DISCARD).


> the transfer length is something we don't expect for FS requests.

Yeah, that's tricky. I'm not sure yet which is better; change how the
block layer handles the transfer length or let the lower layer to add
pages (as we do now).


> > index e16185b..9e15c46 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -20,6 +20,10 @@ static void blkdev_discard_end_io(struct bio *bio, int err)
> > if (bio->bi_private)
> > complete(bio->bi_private);
> >
> > + /* free the page that the lower layer allocated */
> > + if (bio_page(bio))
> > + __free_page(bio_page(bio));
> > +
>
> This is exactly what this patchkit gets rid off. Having a payload
> page that the caller tracks (previously fully, with this patch only for
> freeing) makes DM's life a lot harder. Remember we don't actually store
> any payload in there before entering sd_prep_fn - it's just that the
> scsi commands implementing discards need some payload - either a sector
> sizes zero filled buffer for WRITE SAME, or an LBA/len encoding inside
> the payload for UNMAP.

It's so bad if the block layer frees pages that the lower layer
allocates? I thought it's ok if the block layer doesn't allocate.

It's better if sd_done() frees a page? As my patch does, if we handle
discard as FS in scsi-ml, sd_done() is called.


> > - rq->cmd_type = REQ_TYPE_BLOCK_PC;
> > + rq->cmd_type = REQ_TYPE_FS;
>
> No need to set REQ_TYPE_FS here, it's already set up that way.

Oops, sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Sun, 27 Jun 2010 19:01:33 +0900
FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:

> On Sun, 27 Jun 2010 11:26:52 +0200
> Christoph Hellwig <hch(a)lst.de> wrote:
>
> > On Sun, Jun 27, 2010 at 05:49:29PM +0900, FUJITA Tomonori wrote:
> > > On Sat, 26 Jun 2010 15:56:50 -0400
> > > Mike Snitzer <snitzer(a)redhat.com> wrote:
> > >
> > > > Fix leaks introduced via "block: don't allocate a payload for discard
> > > > request" commit a1d949f5f44.
> > > >
> > > > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > > > discard request's payload directly in scsi_finish_command().
> > >
> > > Instead of adding another discard hack to scsi_finish_command(), how
> > > about converting discard to REQ_TYPE_FS request? discard is FS request
> > > from the perspective of the block layer. It also fixes a problem that
> > > discard isn't retried in the case of UNIT ATTENTION.
> > >
> > > I think that we can get more cleaner code if we handle discard as
> > > normal (fs) request in the block layer (and scsi-ml). We need more
> > > changes but this patch is the first step.
> >
> > Making discard a REQ_TYPE_FS inside scsi (it already is before entering
> > sd_prep_fn) means we'll need to special case it all over the I/O
> > submission and completion path. Having the payload length not matching
>
> Hmm, my patch doesn't add any special case in scsi submission and
> completion. sd_prep_fn already has a hack for discard to set
> bi->bi_size to rq->__data_size so scsi can tell the block layer to
> finish discard requests.
>
> Adding another special case for discard to scsi_io_completion()
> doesn't look good.
>
> About the block layer, we already have special case for discard
> everywhere (rq->cmd_flags & REQ_DISCARD).
>
>
> > the transfer length is something we don't expect for FS requests.
>
> Yeah, that's tricky. I'm not sure yet which is better; change how the
> block layer handles the transfer length or let the lower layer to add
> pages (as we do now).
>
>
> > > index e16185b..9e15c46 100644
> > > --- a/block/blk-lib.c
> > > +++ b/block/blk-lib.c
> > > @@ -20,6 +20,10 @@ static void blkdev_discard_end_io(struct bio *bio, int err)
> > > if (bio->bi_private)
> > > complete(bio->bi_private);
> > >
> > > + /* free the page that the lower layer allocated */
> > > + if (bio_page(bio))
> > > + __free_page(bio_page(bio));
> > > +
> >
> > This is exactly what this patchkit gets rid off. Having a payload
> > page that the caller tracks (previously fully, with this patch only for
> > freeing) makes DM's life a lot harder. Remember we don't actually store
> > any payload in there before entering sd_prep_fn - it's just that the
> > scsi commands implementing discards need some payload - either a sector
> > sizes zero filled buffer for WRITE SAME, or an LBA/len encoding inside
> > the payload for UNMAP.
>
> It's so bad if the block layer frees pages that the lower layer
> allocates? I thought it's ok if the block layer doesn't allocate.
>
> It's better if sd_done() frees a page? As my patch does, if we handle
> discard as FS in scsi-ml, sd_done() is called.

How about this?

=
From: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
Subject: [PATCH] convert discard to REQ_TYPE_FS instead of REQ_TYPE_BLOCK_PC

Fixes the two issues:

- leak of pages that scsi_setup_discard_cmnd() allocates (because we
don't call sd_done for pc requets).

- discard requests aren't retried when possible (e.g. UNIT ATTENTION).

Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
---
drivers/scsi/sd.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d447726..056c8e1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -432,7 +432,6 @@ static int scsi_setup_discard_cmnd(struct scsi_device *sdp, struct request *rq)
nr_sectors >>= 3;
}

- rq->cmd_type = REQ_TYPE_BLOCK_PC;
rq->timeout = SD_TIMEOUT;

memset(rq->cmd, 0, rq->cmd_len);
--
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Sun, 27 Jun 2010 13:07:12 +0200
Christoph Hellwig <hch(a)lst.de> wrote:

> > How about this?
>
> As I tried to explain before this utterly confuses the I/O completion
> path. With the patch applied even a simple mkfs.xfs that issues discard
> just hangs.

Wired. I just tried mkfs.xfs against scsi_debug with my block patches
(I saw one discard command). Seemed that it worked fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Snitzer on
On Sun, Jun 27 2010 at 8:32am -0400,
FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:

> On Sun, 27 Jun 2010 13:07:12 +0200
> Christoph Hellwig <hch(a)lst.de> wrote:
>
> > > How about this?
> >
> > As I tried to explain before this utterly confuses the I/O completion
> > path. With the patch applied even a simple mkfs.xfs that issues discard
> > just hangs.
>
> Wired. I just tried mkfs.xfs against scsi_debug with my block patches
> (I saw one discard command). Seemed that it worked fine.

My leak fixes have been tested extensively against all permuations of
devices with discards (ATA trim, SCSI UNMAP, SCSI WRTIE SAME w/ unmap=1).

I think we need to get Christoph's discard payload transformation
complete by fixing the leaks _without_ trying to rework how discard
commands are tagged, etc. E.g. fix what Jens already has staged in
linux-2.6-block's 'for-next' and 'for-2.6.36'.

With that sorted out we can then look at longer term changes to cleanup
discard request processing.

Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
linux-scsi cc added, since it's a SCSI patch.

On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> Fix leaks introduced via "block: don't allocate a payload for discard
> request" commit a1d949f5f44.
>
> sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> discard request's payload directly in scsi_finish_command().
>
> Also cleanup page allocated for discard payload in
> scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
>
> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
> ---
> block/blk-core.c | 23 +++++++++++++++++++++++
> drivers/scsi/scsi.c | 8 ++++++++
> drivers/scsi/sd.c | 18 ++++++++----------
> include/linux/blkdev.h | 1 +
> 4 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 98b4cee..07925aa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
> }
> EXPORT_SYMBOL_GPL(blk_add_request_payload);
>
> +/**
> + * blk_clear_request_payload - clear a request's payload
> + * @rq: request to update
> + *
> + * The driver needs to take care of freeing the payload itself.
> + */
> +void blk_clear_request_payload(struct request *rq)
> +{
> + struct bio *bio = rq->bio;
> +
> + rq->__data_len = rq->resid_len = 0;
> + rq->nr_phys_segments = 0;
> + rq->buffer = NULL;
> +
> + bio->bi_size = 0;
> + bio->bi_vcnt = 0;
> + bio->bi_phys_segments = 0;
> +
> + bio->bi_io_vec->bv_page = NULL;
> + bio->bi_io_vec->bv_len = 0;
> +}
> +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> +
> void init_request_from_bio(struct request *req, struct bio *bio)
> {
> req->cpu = bio->bi_comp_cpu;
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index ad0ed21..69c7ea4 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
> */
> if (good_bytes == old_good_bytes)
> good_bytes -= scsi_get_resid(cmd);
> + } else if (cmd->request->cmd_flags & REQ_DISCARD) {
> + /*
> + * If this is a discard request that originated from the kernel
> + * we need to free our payload here. Note that we need to check
> + * the request flag as the normal payload rules apply for
> + * pass-through UNMAP / WRITE SAME requests.
> + */
> + __free_page(bio_page(cmd->request->bio));

This is another layering violation: the page is allocated in the Upper
layer and freed in the mid-layer.

I really hate these growing contortions for discard. They're a clear
signal that we haven't implemented it right.

So let's first work out how it should be done. I really like Tomo's
idea of doing discard through the normal REQ_TYPE_FS route, which means
we can control the setup in prep and the tear down in done, all confined
to the ULD.

Where I think I'm at is partially what Christoph says: The command
transformation belongs in the ULD so that's where the allocation and
deallocation should be, and partly what Tomo says in that we should
eliminate the special case paths.

The payload vs actual request size should be a red herring if we've got
everything correct: only the ULD cares about the request parameters.
Once we've got everything set up, the mid layer and LLD should only care
about the parameters in the command, so we can confine the size changing
part to the ULD doing the discard.

Could someone take a stab at this and see if it works without layering
violations or any other problematic signals?

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/