From: Mike Snitzer on
On Thu, Jul 01 2010 at 9:03am -0400,
Mike Snitzer <snitzer(a)redhat.com> wrote:

> On Thu, Jul 01 2010 at 6:49am -0400,
> FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:
>
> > This fixes discard page leak by using q->unprep_rq_fn facility.
> >
> > q->unprep_rq_fn is called when all the data buffer (req->bio and
> > scsi_data_buffer) in the request is freed.
> >
> > sd_unprep() uses rq->buffer to free discard page allocated in
> > sd_prepare_discard().
> >
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
>
> Thanks for sorting this out Tomo, all 3 patches work great!
>
> BTW, there is one remaining (rare) leak in the allocation path.
>
> The following patch serves to fix it but I'm not sure if there is a more
> elegant way to address this.

I've continued to look at this to arrive at alternative implementation.
Here is a summary of the problem:

A 'scsi_setup_discard_cmnd' return other than BLKPREP_OK will not cause
a discard request to get completely stripped down ('blk_finish_request'
isn't calling 'blk_unprep_request' because REQ_DONTPREP is not set by
'scsi_prep_return' for none BLKPREP_OK return). Therefore the discard
request's page will _not_ get cleaned up.

Aside from code inspection, I confirmed this by adding some test code to
force a one-time initial BLKPREP_DEFER return from
'scsi_setup_discard_cmnd'.

> An alternative would be to check if the page is already allocated
> (before allocating the page in scsi_setup_discard_cmnd)?

Unfortunatey this "alternative" won't work because it completely ignores
the case where BLKPREP_KILL is returned from scsi_setup_discard_cmnd'.

> Please advise, thanks.

In short, I'm not too happy that the following patch doesn't allow for
centralized cleanup of the discard request's page (via sd_unprep_fn).
But in order to do that we'd likely have to:
1) relax blk_finish_request's REQ_DONTPREP constraint
2) add other weird conditionals within blk_unprep_request because
the discard request wasn't _really_ prepared?

So given this I'm inclined to stick with the following patch.

Jens and/or James, what do you think?

Mike

> From: Mike Snitzer <snitzer(a)redhat.com>
> Subject: scsi: address leak in the error path of discard page allocation
>
> Be sure to free the discard page if scsi_setup_blk_pc_cmnd fails.
> E.g. Returning BLKPREP_DEFER from scsi_setup_blk_pc_cmnd will not cause
> the request to be processed by sd_unprep_fn before the request is
> retried (preparation included).
>
> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
>
> ---
> block/blk-core.c | 23 +++++++++++++++++++++++
> drivers/scsi/sd.c | 6 +++++-
> include/linux/blkdev.h | 1 +
> 3 files changed, 29 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/scsi/sd.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/sd.c
> +++ linux-2.6/drivers/scsi/sd.c
> @@ -466,7 +466,11 @@ static int scsi_setup_discard_cmnd(struc
>
> blk_add_request_payload(rq, page, len);
> ret = scsi_setup_blk_pc_cmnd(sdp, rq);
> - rq->buffer = page_address(page);
> + if (ret != BLKPREP_OK) {
> + blk_clear_request_payload(rq);
> + __free_page(page);
> + } else
> + rq->buffer = page_address(page);
> return ret;
> }
>
> Index: linux-2.6/block/blk-core.c
> ===================================================================
> --- linux-2.6.orig/block/blk-core.c
> +++ linux-2.6/block/blk-core.c
> @@ -1164,6 +1164,29 @@ void blk_add_request_payload(struct requ
> }
> EXPORT_SYMBOL_GPL(blk_add_request_payload);
>
> +/**
> + * blk_clear_request_payload - clear a request's payload
> + * @rq: request to update
> + *
> + * The driver needs to take care of freeing the payload itself.
> + */
> +void blk_clear_request_payload(struct request *rq)
> +{
> + struct bio *bio = rq->bio;
> +
> + rq->__data_len = rq->resid_len = 0;
> + rq->nr_phys_segments = 0;
> + rq->buffer = NULL;
> +
> + bio->bi_size = 0;
> + bio->bi_vcnt = 0;
> + bio->bi_phys_segments = 0;
> +
> + bio->bi_io_vec->bv_page = NULL;
> + bio->bi_io_vec->bv_len = 0;
> +}
> +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> +
> void init_request_from_bio(struct request *req, struct bio *bio)
> {
> req->cpu = bio->bi_comp_cpu;
> Index: linux-2.6/include/linux/blkdev.h
> ===================================================================
> --- linux-2.6.orig/include/linux/blkdev.h
> +++ linux-2.6/include/linux/blkdev.h
> @@ -781,6 +781,7 @@ extern void blk_insert_request(struct re
> extern void blk_requeue_request(struct request_queue *, struct request *);
> extern void blk_add_request_payload(struct request *rq, struct page *page,
> unsigned int len);
> +extern void blk_clear_request_payload(struct request *rq);
> extern int blk_rq_check_limits(struct request_queue *q, struct request *rq);
> extern int blk_lld_busy(struct request_queue *q);
> extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Thu, 2010-07-01 at 16:15 -0400, Mike Snitzer wrote:
> On Thu, Jul 01 2010 at 9:03am -0400,
> Mike Snitzer <snitzer(a)redhat.com> wrote:
>
> > On Thu, Jul 01 2010 at 6:49am -0400,
> > FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:
> >
> > > This fixes discard page leak by using q->unprep_rq_fn facility.
> > >
> > > q->unprep_rq_fn is called when all the data buffer (req->bio and
> > > scsi_data_buffer) in the request is freed.
> > >
> > > sd_unprep() uses rq->buffer to free discard page allocated in
> > > sd_prepare_discard().
> > >
> > > Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
> >
> > Thanks for sorting this out Tomo, all 3 patches work great!
> >
> > BTW, there is one remaining (rare) leak in the allocation path.
> >
> > The following patch serves to fix it but I'm not sure if there is a more
> > elegant way to address this.
>
> I've continued to look at this to arrive at alternative implementation.
> Here is a summary of the problem:
>
> A 'scsi_setup_discard_cmnd' return other than BLKPREP_OK will not cause
> a discard request to get completely stripped down ('blk_finish_request'
> isn't calling 'blk_unprep_request' because REQ_DONTPREP is not set by
> 'scsi_prep_return' for none BLKPREP_OK return). Therefore the discard
> request's page will _not_ get cleaned up.
>
> Aside from code inspection, I confirmed this by adding some test code to
> force a one-time initial BLKPREP_DEFER return from
> 'scsi_setup_discard_cmnd'.
>
> > An alternative would be to check if the page is already allocated
> > (before allocating the page in scsi_setup_discard_cmnd)?
>
> Unfortunatey this "alternative" won't work because it completely ignores
> the case where BLKPREP_KILL is returned from scsi_setup_discard_cmnd'.
>
> > Please advise, thanks.
>
> In short, I'm not too happy that the following patch doesn't allow for
> centralized cleanup of the discard request's page (via sd_unprep_fn).
> But in order to do that we'd likely have to:
> 1) relax blk_finish_request's REQ_DONTPREP constraint
> 2) add other weird conditionals within blk_unprep_request because
> the discard request wasn't _really_ prepared?
>
> So given this I'm inclined to stick with the following patch.
>
> Jens and/or James, what do you think?

The rules are pretty clear: Unprep is only called if the request gets
prepped ... that means you have to return BLKPREP_OK. Defer or kill
assume there's no teardown to do, so the allocation (if it took place)
must be reversed before returning them

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Snitzer on
On Thu, Jul 01 2010 at 4:19pm -0400,
James Bottomley <James.Bottomley(a)suse.de> wrote:

> On Thu, 2010-07-01 at 16:15 -0400, Mike Snitzer wrote:
> > On Thu, Jul 01 2010 at 9:03am -0400,
> > Mike Snitzer <snitzer(a)redhat.com> wrote:
> >
> > > On Thu, Jul 01 2010 at 6:49am -0400,
> > > FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:
> > >
> > > > This fixes discard page leak by using q->unprep_rq_fn facility.
> > > >
> > > > q->unprep_rq_fn is called when all the data buffer (req->bio and
> > > > scsi_data_buffer) in the request is freed.
> > > >
> > > > sd_unprep() uses rq->buffer to free discard page allocated in
> > > > sd_prepare_discard().
> > > >
> > > > Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
> > >
> > > Thanks for sorting this out Tomo, all 3 patches work great!
> > >
> > > BTW, there is one remaining (rare) leak in the allocation path.
> > >
> > > The following patch serves to fix it but I'm not sure if there is a more
> > > elegant way to address this.
> >
> > I've continued to look at this to arrive at alternative implementation.
> > Here is a summary of the problem:
> >
> > A 'scsi_setup_discard_cmnd' return other than BLKPREP_OK will not cause
> > a discard request to get completely stripped down ('blk_finish_request'
> > isn't calling 'blk_unprep_request' because REQ_DONTPREP is not set by
> > 'scsi_prep_return' for none BLKPREP_OK return). Therefore the discard
> > request's page will _not_ get cleaned up.
> >
> > Aside from code inspection, I confirmed this by adding some test code to
> > force a one-time initial BLKPREP_DEFER return from
> > 'scsi_setup_discard_cmnd'.
> >
> > > An alternative would be to check if the page is already allocated
> > > (before allocating the page in scsi_setup_discard_cmnd)?
> >
> > Unfortunatey this "alternative" won't work because it completely ignores
> > the case where BLKPREP_KILL is returned from scsi_setup_discard_cmnd'.
> >
> > > Please advise, thanks.
> >
> > In short, I'm not too happy that the following patch doesn't allow for
> > centralized cleanup of the discard request's page (via sd_unprep_fn).
> > But in order to do that we'd likely have to:
> > 1) relax blk_finish_request's REQ_DONTPREP constraint
> > 2) add other weird conditionals within blk_unprep_request because
> > the discard request wasn't _really_ prepared?
> >
> > So given this I'm inclined to stick with the following patch.
> >
> > Jens and/or James, what do you think?
>
> The rules are pretty clear: Unprep is only called if the request gets
> prepped ... that means you have to return BLKPREP_OK. Defer or kill
> assume there's no teardown to do, so the allocation (if it took place)
> must be reversed before returning them

OK, thanks for clarifying. This confirms that the general approach I
took in this patch is correct. It remains to be seen if Jens is
agreeable with blk_clear_request_payload.

I know Christoph thought my introduction and use of
blk_clear_request_payload was reasonable. Christoph, please feel free
to add your Ack to this patch if you approve.

I look forward to feedback from Tomo and Jens now too. Hopefully Jens
will pick this patch up.

regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Thu, 01 Jul 2010 15:19:08 -0500
James Bottomley <James.Bottomley(a)suse.de> wrote:

> On Thu, 2010-07-01 at 16:15 -0400, Mike Snitzer wrote:
> > On Thu, Jul 01 2010 at 9:03am -0400,
> > Mike Snitzer <snitzer(a)redhat.com> wrote:
> >
> > > On Thu, Jul 01 2010 at 6:49am -0400,
> > > FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:
> > >
> > > > This fixes discard page leak by using q->unprep_rq_fn facility.
> > > >
> > > > q->unprep_rq_fn is called when all the data buffer (req->bio and
> > > > scsi_data_buffer) in the request is freed.
> > > >
> > > > sd_unprep() uses rq->buffer to free discard page allocated in
> > > > sd_prepare_discard().
> > > >
> > > > Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
> > >
> > > Thanks for sorting this out Tomo, all 3 patches work great!
> > >
> > > BTW, there is one remaining (rare) leak in the allocation path.
> > >
> > > The following patch serves to fix it but I'm not sure if there is a more
> > > elegant way to address this.
> >
> > I've continued to look at this to arrive at alternative implementation.
> > Here is a summary of the problem:
> >
> > A 'scsi_setup_discard_cmnd' return other than BLKPREP_OK will not cause
> > a discard request to get completely stripped down ('blk_finish_request'
> > isn't calling 'blk_unprep_request' because REQ_DONTPREP is not set by
> > 'scsi_prep_return' for none BLKPREP_OK return). Therefore the discard
> > request's page will _not_ get cleaned up.
> >
> > Aside from code inspection, I confirmed this by adding some test code to
> > force a one-time initial BLKPREP_DEFER return from
> > 'scsi_setup_discard_cmnd'.
> >
> > > An alternative would be to check if the page is already allocated
> > > (before allocating the page in scsi_setup_discard_cmnd)?
> >
> > Unfortunatey this "alternative" won't work because it completely ignores
> > the case where BLKPREP_KILL is returned from scsi_setup_discard_cmnd'.
> >
> > > Please advise, thanks.
> >
> > In short, I'm not too happy that the following patch doesn't allow for
> > centralized cleanup of the discard request's page (via sd_unprep_fn).
> > But in order to do that we'd likely have to:
> > 1) relax blk_finish_request's REQ_DONTPREP constraint
> > 2) add other weird conditionals within blk_unprep_request because
> > the discard request wasn't _really_ prepared?
> >
> > So given this I'm inclined to stick with the following patch.
> >
> > Jens and/or James, what do you think?
>
> The rules are pretty clear: Unprep is only called if the request gets
> prepped ... that means you have to return BLKPREP_OK. Defer or kill
> assume there's no teardown to do, so the allocation (if it took place)
> must be reversed before returning them

Seems that scsi-ml calls scsi_unprep_request() for not-prepped
requests in scsi_init_io error path. So we could move that
scsi_unprep_request() to the error path in scsi_prep_return(). Then we
can free discard page in the single place.

Applying the rule strictly is fine by me too; we remove
scsi_unprep_request() in scsi_init_io error path and clean up things
in each prep function's error path.


Btw, blk_clear_request_payload() is necessary?

Making sure that a request is clean is not a bad idea but if we hit
BLKPREP_KILL or BLKPREP_DEFER, we call
blk_end_request(). blk_end_request() can free a request properly even
if we don't do something like blk_clear_request_payload?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Thu, Jul 01, 2010 at 09:03:28AM -0400, Mike Snitzer wrote:
> On Thu, Jul 01 2010 at 6:49am -0400,
> FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote:
>
> > This fixes discard page leak by using q->unprep_rq_fn facility.
> >
> > q->unprep_rq_fn is called when all the data buffer (req->bio and
> > scsi_data_buffer) in the request is freed.
> >
> > sd_unprep() uses rq->buffer to free discard page allocated in
> > sd_prepare_discard().
> >
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp>
>
> Thanks for sorting this out Tomo, all 3 patches work great!
>
> BTW, there is one remaining (rare) leak in the allocation path.
>
> The following patch serves to fix it but I'm not sure if there is a more
> elegant way to address this.
>
> An alternative would be to check if the page is already allocated
> (before allocating the page in scsi_setup_discard_cmnd)?

Ah, should have read your mail first, sorry..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/