From: Mike Snitzer on
On Tue, Mar 2, 2010 at 10:49 PM, Dmitry Monakhov <dmonakhov(a)openvz.org> wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
>> On Sat, Feb 27 2010, Dmitry Monakhov wrote:
>>> merge_bvec_fn() returns bvec->bv_len on success. So we have to check
>>> against this value. But in case of fs_optimization merge we compare
>>> with wrong value. This patch must be included in
>>> �b428cd6da7e6559aca69aa2e3a526037d3f20403
>>> But accidentally i've forgot to add this in the initial patch.
>>> To make things straight let's replace all such checks.
>>> In fact this makes code easy to understand.
>>
>> Agree, applied.
> Ohh.. as you already know this patch break dm-layer. Sorry.
> This is because dm->merge may return more than requested. So correct
> check must test against less what requested. Correct patch attached.

Yes, it is quite common for dm_merge_bvec() to return greater than the
requested length.

But dm_merge_bvec() returning a maximum length, rather than requested,
isn't special. All the other blk_queue_merge_bvec() callers' merge
functions appear to return "maximum amount of bytes we can accept at
this offset" too.

__bio_add_page() only needs to care about whether the
'q->merge_bvec_fn' return is _less than_ the requested length.

> From 145fb49bf2251f445ca29c5218333367448932d6 Mon Sep 17 00:00:00 2001
> From: Dmitry Monakhov <dmonakhov(a)openvz.org>
> Date: Wed, 3 Mar 2010 06:28:06 +0300
> Subject: [PATCH] blkdev: fix merge_bvec_fn return value checks v2
>
> merge_bvec_fn() returns bvec->bv_len on success. So we have to check
> against this value. But in case of fs_optimization merge we compare
> with wrong value. This patch must be included in
> �b428cd6da7e6559aca69aa2e3a526037d3f20403
> But accidentally i've forgot to add this in the initial patch.
> To make things straight let's replace all such checks.
> In fact this makes code easy to understand.
>
> Signed-off-by: Dmitry Monakhov <dmonakhov(a)openvz.org>
> ---
> �fs/bio.c | � �4 ++--
> �1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 88094af..975657a 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -557,7 +557,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> � � � � � � � � � � � � � � � � � � � �.bi_rw = bio->bi_rw,
> � � � � � � � � � � � � � � � �};
>
> - � � � � � � � � � � � � � � � if (q->merge_bvec_fn(q, &bvm, prev) < len) {
> + � � � � � � � � � � � � � � � if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len) {
> � � � � � � � � � � � � � � � � � � � �prev->bv_len -= len;
> � � � � � � � � � � � � � � � � � � � �return 0;
> � � � � � � � � � � � � � � � �}
> @@ -611,7 +611,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> � � � � � � � � * merge_bvec_fn() returns number of bytes it can accept
> � � � � � � � � * at this offset
> � � � � � � � � */
> - � � � � � � � if (q->merge_bvec_fn(q, &bvm, bvec) < len) {
> + � � � � � � � if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
> � � � � � � � � � � � �bvec->bv_page = NULL;
> � � � � � � � � � � � �bvec->bv_len = 0;
> � � � � � � � � � � � �bvec->bv_offset = 0;

NOTE this 2nd hunk doesn't change anything at all because: bvec->bv_len = len;

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Monakhov on
Mike Snitzer <snitzer(a)redhat.com> writes:

> On Tue, Mar 2, 2010 at 10:49 PM, Dmitry Monakhov <dmonakhov(a)openvz.org> wrote:
>> Jens Axboe <jens.axboe(a)oracle.com> writes:
>>
>>> On Sat, Feb 27 2010, Dmitry Monakhov wrote:
>>>> merge_bvec_fn() returns bvec->bv_len on success. So we have to check
>>>> against this value. But in case of fs_optimization merge we compare
>>>> with wrong value. This patch must be included in
>>>>  b428cd6da7e6559aca69aa2e3a526037d3f20403
>>>> But accidentally i've forgot to add this in the initial patch.
>>>> To make things straight let's replace all such checks.
>>>> In fact this makes code easy to understand.
>>>
>>> Agree, applied.
>> Ohh.. as you already know this patch break dm-layer. Sorry.
>> This is because dm->merge may return more than requested. So correct
>> check must test against less what requested. Correct patch attached.
>
> Yes, it is quite common for dm_merge_bvec() to return greater than the
> requested length.
>
> But dm_merge_bvec() returning a maximum length, rather than requested,
> isn't special. All the other blk_queue_merge_bvec() callers' merge
> functions appear to return "maximum amount of bytes we can accept at
> this offset" too.
What for? Does it allow us to make some optimization?
For example like follows:
bio_add_pageS(bio, **pages) {
/* call merge_fn only one untill all space exhausted */
ret = merge_fn() (this returns huge value (1024*1024))
while (ret) {
bio->bi_io_vec[bio->bi_vcnt - 1].bv_page = page;
...
ret -= PAGE_SIZE;
bio->bi_vcnt++;
}
}
IMHO the answer is *NO*, this code will unlikely to work.
>
> __bio_add_page() only needs to care about whether the
> 'q->merge_bvec_fn' return is _less than_ the requested length.
>
>> From 145fb49bf2251f445ca29c5218333367448932d6 Mon Sep 17 00:00:00 2001
>> From: Dmitry Monakhov <dmonakhov(a)openvz.org>
>> Date: Wed, 3 Mar 2010 06:28:06 +0300
>> Subject: [PATCH] blkdev: fix merge_bvec_fn return value checks v2
>>
>> merge_bvec_fn() returns bvec->bv_len on success. So we have to check
>> against this value. But in case of fs_optimization merge we compare
>> with wrong value. This patch must be included in
>>  b428cd6da7e6559aca69aa2e3a526037d3f20403
>> But accidentally i've forgot to add this in the initial patch.
>> To make things straight let's replace all such checks.
>> In fact this makes code easy to understand.
>>
>> Signed-off-by: Dmitry Monakhov <dmonakhov(a)openvz.org>
>> ---
>>  fs/bio.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/bio.c b/fs/bio.c
>> index 88094af..975657a 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -557,7 +557,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>>                                        .bi_rw = bio->bi_rw,
>>                                };
>>
>> -                               if (q->merge_bvec_fn(q, &bvm, prev) < len) {
>> +                               if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len) {
>>                                        prev->bv_len -= len;
>>                                        return 0;
>>                                }
>> @@ -611,7 +611,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>>                 * merge_bvec_fn() returns number of bytes it can accept
>>                 * at this offset
>>                 */
>> -               if (q->merge_bvec_fn(q, &bvm, bvec) < len) {
>> +               if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
>>                        bvec->bv_page = NULL;
>>                        bvec->bv_len = 0;
>>                        bvec->bv_offset = 0;
>
> NOTE this 2nd hunk doesn't change anything at all because: bvec->bv_len = len;
Yess. IMHO this makes code more readable.
>
> Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Snitzer on
On Wed, Mar 03 2010 at 1:45pm -0500,
Dmitry Monakhov <dmonakhov(a)openvz.org> wrote:

> Mike Snitzer <snitzer(a)redhat.com> writes:
>
> > On Tue, Mar 2, 2010 at 10:49 PM, Dmitry Monakhov <dmonakhov(a)openvz.org> wrote:
> >> Jens Axboe <jens.axboe(a)oracle.com> writes:
> >>
> >>> On Sat, Feb 27 2010, Dmitry Monakhov wrote:
> >>>> merge_bvec_fn() returns bvec->bv_len on success. So we have to check
> >>>> against this value. But in case of fs_optimization merge we compare
> >>>> with wrong value. This patch must be included in
> >>>> �b428cd6da7e6559aca69aa2e3a526037d3f20403
> >>>> But accidentally i've forgot to add this in the initial patch.
> >>>> To make things straight let's replace all such checks.
> >>>> In fact this makes code easy to understand.
> >>>
> >>> Agree, applied.
> >> Ohh.. as you already know this patch break dm-layer. Sorry.
> >> This is because dm->merge may return more than requested. So correct
> >> check must test against less what requested. Correct patch attached.
> >
> > Yes, it is quite common for dm_merge_bvec() to return greater than the
> > requested length.
> >
> > But dm_merge_bvec() returning a maximum length, rather than requested,
> > isn't special. All the other blk_queue_merge_bvec() callers' merge
> > functions appear to return "maximum amount of bytes we can accept at
> > this offset" too.
> What for? Does it allow us to make some optimization?

I wasn't suggesting the behavior of the current merge functions
returning maximum is important or useful. I was just pointing out what
the interface is and that dm_merge_bvec() is no different than the
others.

> For example like follows:
> bio_add_pageS(bio, **pages) {
> /* call merge_fn only one untill all space exhausted */
> ret = merge_fn() (this returns huge value (1024*1024))
> while (ret) {
> bio->bi_io_vec[bio->bi_vcnt - 1].bv_page = page;
> ...
> ret -= PAGE_SIZE;
> bio->bi_vcnt++;
> }
> }
> IMHO the answer is *NO*, this code will unlikely to work.

Conversely, I see no value in imposing that these 'q->merge_bvec_fn'
functions return at most the requested length. Can't even really see it
making the __bio_add_page() code more readable.

> > __bio_add_page() only needs to care about whether the
> > 'q->merge_bvec_fn' return is _less than_ the requested length.

Linux has all sorts of internal interfaces that are "odd"... the current
'q->merge_bvec_fn' interface included. But odd is not a problem (nor is
it "broken") unless you make changes that don't consider how the current
interface is defined.

But I digress...

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Monakhov on
Mike Snitzer <snitzer(a)redhat.com> writes:

> Linux has all sorts of internal interfaces that are "odd"... the current
> 'q->merge_bvec_fn' interface included. But odd is not a problem (nor is
> it "broken") unless you make changes that don't consider how the current
> interface is defined.
Ok. then cant you please explain more historical questions
1) Why bio_add_page() can not add less data than requested?
Seems that it doesn't make caller's code much complicate
Off course barrier bio is special case. I don't consider it here.

2) What statement "bio_add_page() must accept at least one page"
exactly means?
IMHO this means that bio_add_page() must accept at least
one page with len (PAGE_SIZE - offset). Or more restricted
statemnt that first bio_add_page() must be always successfull.

But currently in some places this rule treated as what all bio
which has size less whan PAGE_SIZE are accepted. And in x86 such
bio may has up to 8 pages/bvecs.
>
> But I digress...
>
> Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Wed, Mar 03 2010, Dmitry Monakhov wrote:
> Mike Snitzer <snitzer(a)redhat.com> writes:
>
> > Linux has all sorts of internal interfaces that are "odd"... the current
> > 'q->merge_bvec_fn' interface included. But odd is not a problem (nor is
> > it "broken") unless you make changes that don't consider how the current
> > interface is defined.
> Ok. then cant you please explain more historical questions
> 1) Why bio_add_page() can not add less data than requested?
> Seems that it doesn't make caller's code much complicate
> Off course barrier bio is special case. I don't consider it here.

Because the caller may not expect that, a partial add may not make any
sense to the caller. The bio code obviously doesn't care. And it
certainly could complicate the caller a lot, if they need to now issue
and wait for several bio's instead of just a single one. Now a single
completion queue and wait_for_completion() is not enough.

> 2) What statement "bio_add_page() must accept at least one page"
> exactly means?
> IMHO this means that bio_add_page() must accept at least
> one page with len (PAGE_SIZE - offset). Or more restricted
> statemnt that first bio_add_page() must be always successfull.

It's really 'first add must succeed', the restriction being that you
cannot rely on that first add being more than a single page. So the rule
is that you must accept at least a page at any offset if the bio is
currently empty, since we know that a page is typically our IO
granularity.

> But currently in some places this rule treated as what all bio
> which has size less whan PAGE_SIZE are accepted. And in x86 such
> bio may has up to 8 pages/bvecs.

Not sure I follow what you are trying to say here.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/