From: Nick Piggin on
On Wed, May 19, 2010 at 11:45:42AM -0400, Steven Rostedt wrote:
> On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote:
> > On Wed, 19 May 2010, Linus Torvalds wrote:
> > > Btw, since you apparently have a real case - is the "splice to file"
> > > always just an append? IOW, if I'm not right in assuming that the only
> > > sane thing people would reasonable care about is "append to a file", then
> > > holler now.
> >
> > Virtual machines might reasonably need this for splicing to a disk
> > image.
>
> This comes down to balancing speed and complexity. Perhaps a copy is
> fine in this case.
>
> I'm concerned about high speed tracing, where we are always just taking
> pages from the trace ring buffer and appending them to a file or sending
> them off to the network. The slower this is, the more likely you will
> lose events.
>
> If the "move only on append to file" is easy to implement, I would
> really like to see that happen. The speed of splicing a disk image for a
> virtual machine only impacts the patience of the user. The speed of
> splicing tracing output, impacts how much you can trace without losing
> events.

It's not "easy" to implement :) What's your ring buffer look like?
Is it a normal user address which the kernel does copy_to_user()ish
things into? Or a mmapped special driver?

If the latter, it get's even harder again. But either way if the
source pages just have to be regenerated anyway (eg. via page fault
on next access), then it might not even be worthwhile to do the
splice move.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Steven Rostedt (rostedt(a)goodmis.org) wrote:
> On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote:
> > On Wed, 19 May 2010, Linus Torvalds wrote:
> > > Btw, since you apparently have a real case - is the "splice to file"
> > > always just an append? IOW, if I'm not right in assuming that the only
> > > sane thing people would reasonable care about is "append to a file", then
> > > holler now.
> >
> > Virtual machines might reasonably need this for splicing to a disk
> > image.
>
> This comes down to balancing speed and complexity. Perhaps a copy is
> fine in this case.
>
> I'm concerned about high speed tracing, where we are always just taking
> pages from the trace ring buffer and appending them to a file or sending
> them off to the network. The slower this is, the more likely you will
> lose events.
>
> If the "move only on append to file" is easy to implement, I would
> really like to see that happen. The speed of splicing a disk image for a
> virtual machine only impacts the patience of the user. The speed of
> splicing tracing output, impacts how much you can trace without losing
> events.

I'm with Steven here. I only care about appending full pages at the end of a
file. If possible, I'd also like to steal back the pages after waiting for the
writeback I/O to complete so we can put them back in the ring buffer without
stressing the page cache and the page allocator needlessly.

Thanks,

Mathieu


--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Steven Rostedt (rostedt(a)goodmis.org) wrote:
> On Wed, 2010-05-19 at 07:59 -0700, Linus Torvalds wrote:
> >
>
> > Btw, since you apparently have a real case - is the "splice to file"
> > always just an append? IOW, if I'm not right in assuming that the only
> > sane thing people would reasonable care about is "append to a file", then
> > holler now.
>
> My use case is just to move the data from the ring buffer into a file
> (or network) as fast as possible. It creates a new file and all
> additions are "append to a file".
>
> I believe Mathieu does the same.
>
> With me, you are correct.

Same here. My ring buffer only ever use splice() to append at the end of a file
or to the network, and always outputs data in multiples of the page size.

Thanks,

Mathieu

>
> -- Steve
>
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Miklos Szeredi on
On Wed, 19 May 2010, Linus Torvalds wrote:
> On Wed, 19 May 2010, Miklos Szeredi wrote:
> >
> > Another limitation I found while splicing from one file to another is
> > that stealing from the source file's page cache does not always
> > succeed. This turned out to be because of a reference from the lru
> > cache for freshly read pages. I'm not sure how this could be fixed.
>
> It should be fixed by saying "you can't always just move the page".
>
> Copying is not evil. Complexity to avoid copies is evil.

And predictability is good. The thing I don't like about the above is
that it makes it totally unpredictable which pages will get moved, if
any.

Another related thing: if splicing from a file knowing that it will
need to be stolen, then it makes zero sense to first insert the pages
into the page cache then remove them shortly to be inserted into
another file's cache. So we could have a flag saying "don't cache
newly read pages, just put them in the pipe buffer", which would solve
the above problem as well as speeding up the operation.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mathieu Desnoyers on
* Nick Piggin (npiggin(a)suse.de) wrote:
> On Wed, May 19, 2010 at 11:45:42AM -0400, Steven Rostedt wrote:
> > On Wed, 2010-05-19 at 17:33 +0200, Miklos Szeredi wrote:
> > > On Wed, 19 May 2010, Linus Torvalds wrote:
> > > > Btw, since you apparently have a real case - is the "splice to file"
> > > > always just an append? IOW, if I'm not right in assuming that the only
> > > > sane thing people would reasonable care about is "append to a file", then
> > > > holler now.
> > >
> > > Virtual machines might reasonably need this for splicing to a disk
> > > image.
> >
> > This comes down to balancing speed and complexity. Perhaps a copy is
> > fine in this case.
> >
> > I'm concerned about high speed tracing, where we are always just taking
> > pages from the trace ring buffer and appending them to a file or sending
> > them off to the network. The slower this is, the more likely you will
> > lose events.
> >
> > If the "move only on append to file" is easy to implement, I would
> > really like to see that happen. The speed of splicing a disk image for a
> > virtual machine only impacts the patience of the user. The speed of
> > splicing tracing output, impacts how much you can trace without losing
> > events.
>
> It's not "easy" to implement :) What's your ring buffer look like?
> Is it a normal user address which the kernel does copy_to_user()ish
> things into? Or a mmapped special driver?
>
> If the latter, it get's even harder again. But either way if the
> source pages just have to be regenerated anyway (eg. via page fault
> on next access), then it might not even be worthwhile to do the
> splice move.

Steven and I use pages to which we write directly by using the page address from
the linear memory mapping returned by page_address(). These pages have no other
mapping. They are moved to the pipe, and then from the pipe to a file (or to the
network). It's possibly the simplest scenario you could think of for splice().

Thanks,

Mathieu


--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/