From: Brian Bloniarz on
On 05/24/2010 03:28 AM, Michael Kerrisk wrote:
> Actually, SO_*BUF is pretty weird. It returns double what was
> supplied. It's not simply a matter of rounding up: it always doubles
> what was supplied.

Rationale in net/core/sock.c:

set_rcvbuf:
sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
/*
* We double it on the way in to account for
* "struct sk_buff" etc. overhead. Applications
* assume that the SO_RCVBUF setting they make will
* allow that much actual data to be received on that
* socket.
*
* Applications are unaware that "struct sk_buff" and
* other overheads allocate from the receive buffer
* during socket buffer allocation.
*
* And after considering the possible alternatives,
* returning the value we actually used in getsockopt
* is the most desirable behavior.
*/
if ((val * 2) < SOCK_MIN_RCVBUF)
sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
else
sk->sk_rcvbuf = val * 2;
break;

I'm guessing pipes don't have this kind of wrinkle.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on
On Mon, May 24, 2010 at 4:51 PM, Brian Bloniarz <bmb(a)athenacr.com> wrote:
> On 05/24/2010 03:28 AM, Michael Kerrisk wrote:
>> Actually, SO_*BUF is pretty weird. It returns double what was
>> supplied. It's not simply a matter of rounding up: it always doubles
>> what was supplied.
>
> Rationale in net/core/sock.c:
>
> set_rcvbuf:
> � � � � � � � �sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
> � � � � � � � �/*
> � � � � � � � � * We double it on the way in to account for
> � � � � � � � � * "struct sk_buff" etc. overhead. � Applications
> � � � � � � � � * assume that the SO_RCVBUF setting they make will
> � � � � � � � � * allow that much actual data to be received on that
> � � � � � � � � * socket.
> � � � � � � � � *
> � � � � � � � � * Applications are unaware that "struct sk_buff" and
> � � � � � � � � * other overheads allocate from the receive buffer
> � � � � � � � � * during socket buffer allocation.
> � � � � � � � � *
> � � � � � � � � * And after considering the possible alternatives,
> � � � � � � � � * returning the value we actually used in getsockopt
> � � � � � � � � * is the most desirable behavior.
> � � � � � � � � */
> � � � � � � � �if ((val * 2) < SOCK_MIN_RCVBUF)
> � � � � � � � � � � � �sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
> � � � � � � � �else
> � � � � � � � � � � � �sk->sk_rcvbuf = val * 2;
> � � � � � � � �break;
>
> I'm guessing pipes don't have this kind of wrinkle.

Yes, all of the above is understood. It's exposing these details to
userspace that's weird...

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface" http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Mon, May 24 2010, OGAWA Hirofumi wrote:
> Jens Axboe <jens.axboe(a)oracle.com> writes:
>
> >> >> I'd recommend this: Pass it in and out in bytes. Don't round to a
> >> >> power of 2. Require the user to know what they are doing. Give an
> >> >> error if the user doesn't supply a power-of-2 * page-size for
> >> >> F_SETPIPE_SZ. (Again, consider the case of architectures with
> >> >> switchable page sizes.)
> >> >
> >> > But is there much point in erroring on an incorrect size? If the
> >> > application says "I need at least 120kb of space in there", kernel
> >> > returns "OK, you got 128kb". Would returning -1/EINVAL for that case
> >> > really make a better API? Doesn't seem like it to me.
> >>
> >> FWIW, my first impression of this was setsockopt(SO_RCV/SNDBUF) of unix
> >> socket. Well, API itself wouldn't say "at least this size" or "exactly
> >> this size", so, in here, important thing is consistency of interfaces, I
> >> think. (And the both is sane API at least for me if those had
> >> consistency in the system.)
> >>
> >> Well, so how about set/get in bytes, and kernel will set "at least
> >> specified size" actually like setsockopt(SO_RCV/SNDBUF)?
> >
> > Isn't that pretty much what I described?
>
> Yes, probably. Well, 120kb was still multiple of page size. :)

It is, but 120KB/page_size is not (which is the power-of-2 of interest
here).

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Mon, May 24 2010, Michael Kerrisk wrote:
> > Right, that looks like a thinko.
> >
> > I'll submit a patch changing it to bytes and the agreed API and fix this
> > -Eerror. Thanks for your comments and suggestions!
>
> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
> this patche (and all patches that change the API/ABI.)

The first change is this:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b

and the one dealing with the pages vs bytes API is this:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29

Not tested yet, will do so before sending in of course.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on
On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> On Mon, May 24 2010, Michael Kerrisk wrote:
>> > Right, that looks like a thinko.
>> >
>> > I'll submit a patch changing it to bytes and the agreed API and fix this
>> > -Eerror. Thanks for your comments and suggestions!
>>
>> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
>> this patche (and all patches that change the API/ABI.)
>
> The first change is this:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b
>
> and the one dealing with the pages vs bytes API is this:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29
>
> Not tested yet, will do so before sending in of course.

Eyeballing it quickly, these changes look right.

Do you have some test programs you can make available?

Thanks,

Michael




--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface" http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/