From: Linus Torvalds on


On Tue, 2 Feb 2010, Wu Fengguang wrote:
>
> Some applications (eg. blkid, id3tool etc.) seek around the file
> to get information. For example, blkid does
> seek to 0
> read 1024
> seek to 1536
> read 16384
>
> The start-of-file readahead heuristic is wrong for them, whose
> access pattern can be identified by lseek() calls.
>
> So test-and-set a READAHEAD_LSEEK flag on lseek() and don't
> do start-of-file readahead on seeing it. Proposed by Linus.
>
> CC: Linus Torvalds <torvalds(a)linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>

Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Tue, 2 Feb 2010, Olivier Galibert wrote:
>
> Wouldn't that trigger on lseeks to end of file to get the size?

Well, you'd only ever do that with a raw block device, no (if even that:
more "raw block device" tools just use the BLKSIZE64 ioctl etc)? Any sane
regular file accessor will do 'fstat()' instead.

And do we care about startup speed of ramping up read-ahead from the
beginning? In fact, the problem case that caused this was literally
'blkid' on a block device - and the fact that the kernel tried to
read-ahead TOO MUCh rather than too little.

If somebody is really doing lots of serial reading, the read-ahead code
will figure it out very quickly. The case this worries about is just the
_first_ read, where the question is one of "do we think it might be
seeking around, or does it look like the user is going to just read the
whole thing"?

IOW, if you start off with a SEEK_END, I think it's reasonable to expect
it to _not_ read the whole thing.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Tue, 2 Feb 2010, Olivier Galibert wrote:
>
> On Tue, Feb 02, 2010 at 10:40:41AM -0800, Linus Torvalds wrote:
> > IOW, if you start off with a SEEK_END, I think it's reasonable to expect
> > it to _not_ read the whole thing.
>
> I've seen a lot of:
> int fd = open(...);
> size = lseek(fd, 0, SEEK_END);
> lseek(fd, 0, SEEK_SET);
>
> data = malloc(size);
> read(fd, data, size);
> close(fd);
>
> Why not fstat? I don't know.

Well, the above will work perfectly with or without the patch, since it
does the read of the full size. There is no read-ahead hint necessary for
that kind of single read behavior.

Rememebr: read-ahead is about filling the empty IO spaces _between_ reads,
and turning many smaller reads into one bigger one. If you only have a
single big read, read-ahead cannot help.

Also, keep in mind that read-ahead is not always a win. It can be a huge
loss too. Which is why we have _heuristics_. They fundamentally cannot
catch every case, but what they aim for is to do a good job on average.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Tue, 2 Feb 2010, david(a)lang.hm wrote:

> On Tue, 2 Feb 2010, Linus Torvalds wrote:
> >
> > Also, keep in mind that read-ahead is not always a win. It can be a huge
> > loss too. Which is why we have _heuristics_. They fundamentally cannot
> > catch every case, but what they aim for is to do a good job on average.
>
> as a note from the field, I just had an application that needed to be changed
> because it did excessive read-ahead. it turned a 2 min reporting run into a 20
> min reporting run because for this report the access was really random and the
> app forced large read-ahead.

Yeah. And the reason Wu did this patch is similar: something that _should_
have taken just quarter of a second took about 7 seconds, because
read-ahead triggered on this really slow device that only feeds about
15kB/s (yes, _kilo_byte, not megabyte).

You can always use POSIX_FADVISE_RANDOM to disable it, but it's seldom
something that people do. And there are real loads that have random
components to them without being _entirely_ random, so in an optimal world
we should just have heuristics that work well.

The problem is, it's often easier to test/debug the "good" cases, ie the
cases where we _want_ read-ahead to trigger. So that probably means that
we have a tendency to read-ahead too aggressively, because those cases are
the ones where people can most easily look at it and say "yeah, this
improves throughput of a 'dd bs=8192'".

So then when we find loads where read-ahead hurts, I think we need to take
_that_ case very seriously. Because otherwise our selection bias for
testing read-ahead will fail.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Tue, 2 Feb 2010, Linus Torvalds wrote:

> Rememebr: read-ahead is about filling the empty IO spaces _between_ reads,
> and turning many smaller reads into one bigger one. If you only have a
> single big read, read-ahead cannot help.
>
> Also, keep in mind that read-ahead is not always a win. It can be a huge
> loss too. Which is why we have _heuristics_. They fundamentally cannot
> catch every case, but what they aim for is to do a good job on average.

as a note from the field, I just had an application that needed to be
changed because it did excessive read-ahead. it turned a 2 min reporting
run into a 20 min reporting run because for this report the access was
really random and the app forced large read-ahead.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/