From: Michael Kerrisk on
On Mon, May 24, 2010 at 3:43 AM, OGAWA Hirofumi
<hirofumi(a)mail.parknet.co.jp> wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
>>> > We can easily make F_GETPIPE_SZ return bytes, but I don't think passing
>>> > in bytes to F_SETPIPE_SZ makes a lot of sense. The pipe array must be a
>>> > power of 2 in pages. So the question is if that makes the API cleaner,
>>> > passing in number of pages but returning bytes? Or pass in bytes all
>>> > around, but have F_SETPIPE_SZ round to the nearest multiple of pow2 in
>>> > pages if need be. Then it would return a size at least what was passed
>>> > in, or error.
>
> I really think "power of 2 in pages" is simply current implementation
> detail, not detail of pipe API.

That's a good point.

>>> I'd recommend this: Pass it in and out in bytes. Don't round to a
>>> power of 2. Require the user to know what they are doing. Give an
>>> error if the user doesn't supply a power-of-2 * page-size for
>>> F_SETPIPE_SZ. (Again, consider the case of architectures �with
>>> switchable page sizes.)
>>
>> But is there much point in erroring on an incorrect size? If the
>> application says "I need at least 120kb of space in there", kernel
>> returns "OK, you got 128kb". Would returning -1/EINVAL for that case
>> really make a better API? Doesn't seem like it to me.
>
> FWIW, my first impression of this was setsockopt(SO_RCV/SNDBUF) of unix
> socket. Well, API itself wouldn't say "at least this size" or "exactly
> this size", so, in here, important thing is consistency of interfaces, I
> think. (And the both is sane API at least for me if those had
> consistency in the system.)
>
> Well, so how about set/get in bytes, and kernel will set "at least
> specified size" actually like setsockopt(SO_RCV/SNDBUF)?

The "at least" idea makes sense. So, I'd change my recommendation to:

Pass the buffer size in and out in bytes (for consistency with other
APIs). Round the input (F_SETPIPE_SZ) value up as required by the
implementation. For the output (F_GETPIPE_SZ) value do one of the
following:
a) Return the value given on input.
b) Return the rounded up value actually used by the kernel.

I suspect (b) might be more useful: if an application cares enough
about pipe size to want to change it, then at least some such
applications might care to know exactly the size that the kernel used.
(And: I can't see any downside to (b).)

One other comment about the interface. We have

if (!capable(CAP_SYS_ADMIN) && arg > pipe_max_pages)
return -EINVAL;

The usual error on a capability denied is EPERM. Please change.

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface" http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Mon, May 24 2010, OGAWA Hirofumi wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
> >> > We can easily make F_GETPIPE_SZ return bytes, but I don't think passing
> >> > in bytes to F_SETPIPE_SZ makes a lot of sense. The pipe array must be a
> >> > power of 2 in pages. So the question is if that makes the API cleaner,
> >> > passing in number of pages but returning bytes? Or pass in bytes all
> >> > around, but have F_SETPIPE_SZ round to the nearest multiple of pow2 in
> >> > pages if need be. Then it would return a size at least what was passed
> >> > in, or error.
>
> I really think "power of 2 in pages" is simply current implementation
> detail, not detail of pipe API.

Completely agree, one more reason more to make that dependency exposed
in the API.

> >> I'd recommend this: Pass it in and out in bytes. Don't round to a
> >> power of 2. Require the user to know what they are doing. Give an
> >> error if the user doesn't supply a power-of-2 * page-size for
> >> F_SETPIPE_SZ. (Again, consider the case of architectures with
> >> switchable page sizes.)
> >
> > But is there much point in erroring on an incorrect size? If the
> > application says "I need at least 120kb of space in there", kernel
> > returns "OK, you got 128kb". Would returning -1/EINVAL for that case
> > really make a better API? Doesn't seem like it to me.
>
> FWIW, my first impression of this was setsockopt(SO_RCV/SNDBUF) of unix
> socket. Well, API itself wouldn't say "at least this size" or "exactly
> this size", so, in here, important thing is consistency of interfaces, I
> think. (And the both is sane API at least for me if those had
> consistency in the system.)
>
> Well, so how about set/get in bytes, and kernel will set "at least
> specified size" actually like setsockopt(SO_RCV/SNDBUF)?

Isn't that pretty much what I described?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on
On Mon, May 24, 2010 at 9:05 AM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
>> The "at least" idea makes sense. So, I'd change my recommendation to:
>>
>> Pass the buffer size in and out in bytes (for consistency with other
>> APIs). Round the input (F_SETPIPE_SZ) value up as required by the
>> implementation. For the output (F_GETPIPE_SZ) value do one of the
>> following:
>> a) Return the value given on input.
>> b) Return the rounded up value actually used by the kernel.
>>
>> I suspect (b) might be more useful: if an application cares enough
>> about pipe size to want to change it, then at least some such
>> applications might care to know exactly the size that the kernel used.
>> (And: I can't see any downside to (b).)
>
> b definitely, since it's the real size (plus then we don't have to track
> the passed in size).

Okay.

>> One other comment about the interface. We have
>>
>> � � � � � � � � if (!capable(CAP_SYS_ADMIN) && arg > pipe_max_pages)
>> � � � � � � � � � � � � return -EINVAL;
>>
>> The usual error on a capability denied is EPERM. Please change.
>
> Right, that looks like a thinko.
>
> I'll submit a patch changing it to bytes and the agreed API and fix this
> -Eerror. Thanks for your comments and suggestions!

Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
this patche (and all patches that change the API/ABI.)

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface" http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: OGAWA Hirofumi on
Jens Axboe <jens.axboe(a)oracle.com> writes:

>> >> I'd recommend this: Pass it in and out in bytes. Don't round to a
>> >> power of 2. Require the user to know what they are doing. Give an
>> >> error if the user doesn't supply a power-of-2 * page-size for
>> >> F_SETPIPE_SZ. (Again, consider the case of architectures with
>> >> switchable page sizes.)
>> >
>> > But is there much point in erroring on an incorrect size? If the
>> > application says "I need at least 120kb of space in there", kernel
>> > returns "OK, you got 128kb". Would returning -1/EINVAL for that case
>> > really make a better API? Doesn't seem like it to me.
>>
>> FWIW, my first impression of this was setsockopt(SO_RCV/SNDBUF) of unix
>> socket. Well, API itself wouldn't say "at least this size" or "exactly
>> this size", so, in here, important thing is consistency of interfaces, I
>> think. (And the both is sane API at least for me if those had
>> consistency in the system.)
>>
>> Well, so how about set/get in bytes, and kernel will set "at least
>> specified size" actually like setsockopt(SO_RCV/SNDBUF)?
>
> Isn't that pretty much what I described?

Yes, probably. Well, 120kb was still multiple of page size. :)

Thanks.
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: OGAWA Hirofumi on
Michael Kerrisk <mtk.manpages(a)googlemail.com> writes:

> Actually, SO_*BUF is pretty weird. It returns double what was
> supplied. It's not simply a matter of rounding up: it always doubles
> what was supplied.

Yes. However, well, I'm feeling it also is implementation detail of "at
least". :)
--
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/