From: Jens Axboe on
On Tue, May 25 2010, Michael Kerrisk wrote:
> On Mon, May 24, 2010 at 7:56 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> >> > Right, that looks like a thinko.
> >> >> >
> >> >> > I'll submit a patch changing it to bytes and the agreed API and fix this
> >> >> > -Eerror. Thanks for your comments and suggestions!
> >> >>
> >> >> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
> >> >> this patche (and all patches that change the API/ABI.)
> >> >
> >> > The first change is this:
> >> >
> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b
> >> >
> >> > and the one dealing with the pages vs bytes API is this:
> >> >
> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29
> >> >
> >> > Not tested yet, will do so before sending in of course.
> >>
> >> Eyeballing it quickly, these changes look right.
> >
> > Good, thanks.
> >
> >> Do you have some test programs you can make available?
> >
> > Actually I don't, I test it by modifying fio's splice engine to set/get
> > the pipe size and test the resulting transfers.
>
> Two more questions: is the rationale for this feature written up
> somewhere? I could not find it. Is it primarily intended for
> splice/vmsplice/tee, with the effect for pipe(2) being a side effect?

Yes it's primarily for splice, where the 64kb size can sometimes become
a limiting factor because of the pipe mutex lock/unlocking.

> Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> (Notwithstanding Linus's assertion, a buffer size of 1 page did give
> us POSIX compliance in kernels before 2.6.10.)

I'll defer to Linus on that, I remember some emails on that part from
way back when. As far as I can tell, POSIX wants atomic writes of "less
than a page size", which would make more sense as "of a page size and
less". And since it should not be a page size from either side on a
uni-directional pipe, then 1 page seems enough for that guarantee at
least.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Tue, 1 Jun 2010, Jens Axboe wrote:
>
> > Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> > (Notwithstanding Linus's assertion, a buffer size of 1 page did give
> > us POSIX compliance in kernels before 2.6.10.)
>
> I'll defer to Linus on that, I remember some emails on that part from
> way back when. As far as I can tell, POSIX wants atomic writes of "less
> than a page size", which would make more sense as "of a page size and
> less". And since it should not be a page size from either side on a
> uni-directional pipe, then 1 page seems enough for that guarantee at
> least.

Hmm. You guys may well be right that a single slot is sufficient. It still
gives us PIPE_BUF worth of data for writing atomically. I had this memory
that we needed two because of the merging logic (we have that special case
for re-using the previous page, so that we don't use waste of memory for
lots of small writes), but looking at the code there is no reason at all
for me to hav thought so.

So I don't know why I thought we needed the extra slot, and a single slot
(if anybody really wants slow writes) looks to be fine.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Loke, Chetan on
> -----Original Message-----
> From: linux-kernel-owner(a)vger.kernel.org [mailto:linux-kernel-
> owner(a)vger.kernel.org] On Behalf Of Linus Torvalds
> Sent: June 01, 2010 11:22 AM
>
> On Tue, 1 Jun 2010, Jens Axboe wrote:
> >
> > > Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> > > (Notwithstanding Linus's assertion, a buffer size of 1 page did
> give us POSIX compliance in kernels before 2.6.10.)
> >
> > I'll defer to Linus on that, I remember some emails on that part
from
> > way back when. As far as I can tell, POSIX wants atomic writes of
> > "less than a page size", which would make more sense as "of a page
size and
> > less". And since it should not be a page size from either side on a
> > uni-directional pipe, then 1 page seems enough for that guarantee at
> > least.
>
> Hmm. You guys may well be right that a single slot is sufficient. It
> still gives us PIPE_BUF worth of data for writing atomically. I had
this
> memory that we needed two because of the merging logic (we have that
special
> case for re-using the previous page, so that we don't use waste of
memory
> for lots of small writes), but looking at the code there is no reason
at
> all for me to hav thought so.
>
> So I don't know why I thought we needed the extra slot, and a single
> slot (if anybody really wants slow writes) looks to be fine.
>

Ok, I have a really dumb/basic question. The reason we are letting users
grow the pipe->buffers is to decrease the number of splice-calls. This
implies the user has fnctl'd(when he/she wants performance). Can we not
have an option where we don't have to 'alloc pipe->buffers' worth pages
every single time? As an example look at 'default_file_splice_read'. Is
it possible to enhance the existing functionality by defining a new cmd
and a flag(in struct pipe_xxx etc) and allowing an user to control that?
Something like 'fcntl->F_SETPIPE_SZ_AND_LOCK_PIPE_PAGES'? Does this make
sense?

regards
Chetan Loke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on
On Tue, Jun 1, 2010 at 9:45 AM, Jens Axboe <axboe(a)kernel.dk> wrote:
> On Thu, May 27 2010, Michael Kerrisk wrote:
>> Jens,
>>
>> On Mon, May 24, 2010 at 7:56 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
>> > On Mon, May 24 2010, Michael Kerrisk wrote:
>> >> On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
>> >> > On Mon, May 24 2010, Michael Kerrisk wrote:
>> >> >> > Right, that looks like a thinko.
>> >> >> >
>> >> >> > I'll submit a patch changing it to bytes and the agreed API and fix this
>> >> >> > -Eerror. Thanks for your comments and suggestions!
>> >> >>
>> >> >> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
>> >> >> this patche (and all patches that change the API/ABI.)
>> >> >
>> >> > The first change is this:
>> >> >
>> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b
>> >> >
>> >> > and the one dealing with the pages vs bytes API is this:
>> >> >
>> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29
>> >> >
>> >> > Not tested yet, will do so before sending in of course.
>> >>
>> >> Eyeballing it quickly, these changes look right.
>> >
>> > Good, thanks.
>> >
>> >> Do you have some test programs you can make available?
>> >
>> > Actually I don't, I test it by modifying fio's splice engine to set/get
>> > the pipe size and test the resulting transfers.
>>
>> An afterthought. Do there not also need to be fixes to the /proc
>> interfaces. I don't think they were included in your revised patches.
>
> I think the proc part can be sanely left in pages, since it's just a
> memory limiter.

I can't see any advantage to using two different units for these
closely related APIs, and it does seem like it could be a source of
confusion. Similar APIs that I can think of like RLIMIT_MEMLOCK and
shmget() SHMMAX that impose per-process memory-related limits use
bytes. Best to be consistent, don't you think?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface" http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Wed, Jun 02 2010, Michael Kerrisk wrote:
> On Tue, Jun 1, 2010 at 9:45 AM, Jens Axboe <axboe(a)kernel.dk> wrote:
> > On Thu, May 27 2010, Michael Kerrisk wrote:
> >> Jens,
> >>
> >> On Mon, May 24, 2010 at 7:56 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> >> On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote:
> >> >> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> >> >> > Right, that looks like a thinko.
> >> >> >> >
> >> >> >> > I'll submit a patch changing it to bytes and the agreed API and fix this
> >> >> >> > -Eerror. Thanks for your comments and suggestions!
> >> >> >>
> >> >> >> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on
> >> >> >> this patche (and all patches that change the API/ABI.)
> >> >> >
> >> >> > The first change is this:
> >> >> >
> >> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b
> >> >> >
> >> >> > and the one dealing with the pages vs bytes API is this:
> >> >> >
> >> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29
> >> >> >
> >> >> > Not tested yet, will do so before sending in of course.
> >> >>
> >> >> Eyeballing it quickly, these changes look right.
> >> >
> >> > Good, thanks.
> >> >
> >> >> Do you have some test programs you can make available?
> >> >
> >> > Actually I don't, I test it by modifying fio's splice engine to set/get
> >> > the pipe size and test the resulting transfers.
> >>
> >> An afterthought. Do there not also need to be fixes to the /proc
> >> interfaces. I don't think they were included in your revised patches.
> >
> > I think the proc part can be sanely left in pages, since it's just a
> > memory limiter.
>
> I can't see any advantage to using two different units for these
> closely related APIs, and it does seem like it could be a source of
> confusion. Similar APIs that I can think of like RLIMIT_MEMLOCK and
> shmget() SHMMAX that impose per-process memory-related limits use
> bytes. Best to be consistent, don't you think?

But they are different interfaces. I think the 'pass in required size,
return actual size' where actual size is >= required size makes sense
for the syscall part, but for an "admin" interface it is more logical to
deal in pages. Perhaps that's just me and the average admin does not
agree. So while it's just detail, it's also an interface so has some
importance. And if there's consensus that bytes is a cleaner interface
on the proc side as well, then lets change it.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/