From: Phillip Susi on
On 4/20/2010 8:44 PM, Jamie Lokier wrote:
> readahead() doesn't make much sense on a directory - the offset and
> size aren't meaningful.
>
> But does plain opendir/readdir/closedir solve the problem?

No, since those are synchronous. I want to have readahead() queue up
reading the entire directory in the background to avoid blocking, and
get the queue filled with a bunch of requests that can be merged into
larger segments before being dispatched to the hardware.

I don't actually care to have the contents of the directories returned,
so readdir() does more than I need in that respect, and also it performs
a blocking read of one disk block at a time, which is horribly slow with
a cold cache.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Phillip Susi wrote:
> On 4/20/2010 8:44 PM, Jamie Lokier wrote:
> > readahead() doesn't make much sense on a directory - the offset and
> > size aren't meaningful.
> >
> > But does plain opendir/readdir/closedir solve the problem?
>
> No, since those are synchronous. I want to have readahead() queue up
> reading the entire directory in the background to avoid blocking, and
> get the queue filled with a bunch of requests that can be merged into
> larger segments before being dispatched to the hardware.

Asynchronous is available: Use clone or pthreads.

More broadly: One of the ways to better I/O sorting is to make sure
you've got enough things in parallel that the I/O queue is never
empty, so what you issue has time to get sorted before it reaches the
head of the queue for dispatch. On the other hand, not so many things
in parallel that the queues fill up and throttle. Unfortunately it
only works if things aren't serialised by kernel locks - but there's been
a lot of work on lockless this and that in the kernel, which may help.

Back to your problem: You need a bunch of scattered block requests to
be queued and sorted sanely, and readdir doesn't do that, and even
waits for each block before issuing the next request.

Or does it?

A quick skim of fs/{ext3,ext4}/dir.c finds a call to
page_cache_sync_readahead. Doesn't that do any reading ahead? :-)

> I don't actually care to have the contents of the
> directories returned, so readdir() does more than I need in that
> respect, and also it performs a blocking read of one disk block at a
> time, which is horribly slow with a cold cache.

I/O is the probably the biggest cost, so it's more important to get
the I/O pattern you want than worrying about return values you'll discard.

If readdir() calls are slowed by lots of calls and libc, consider
using the getdirentries system call directly.

If not, fs/ext4/namei.c:ext4_dir_inode_operations points to
ext4_fiemap. So you may have luck calling FIEMAP or FIBMAP on the
directory, and then reading blocks using the block device. I'm not
sure if the cache loaded via the block device (when mounted) will then
be used for directory lookups.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Evgeniy Polyakov on
On Wed, Apr 21, 2010 at 07:51:24PM +0100, Jamie Lokier (jamie(a)shareable.org) wrote:
> Fwiw, I found sorting directories by inode and reading them in that
> order help to reduce seeks, some 10 years ago. I implemented
> something like 'find' which works like that, keeping a queue of
> directories to read and things to open/stat, ordered by inode number
> seen in d_ino before open/stat and st_ino after. However it did not
> try to readahead the blocks inside a directory, or sort operations by
> block number. It reduced some 'find'-like operations to about a
> quarter of the time on cold cache. I still use that program sometimes
> before "git status" ;-) Google "treescan" and "lokier" if you're
> interested in trying it (though I use 0.7 which isn't published).

As you might expect it is not really a directory readahead :)
Nad I'm not really sure ext234 can implement it in kernel more optimally
without breaking backward compatibility though.

> > it is not about readdir(). Plain read() is synchronous too. But
> > filesystem can respond to readahead calls and read next block to current
> > one, while it won't do this for next direntry.
>
> I'm surprised it makes much difference, as directories are usually not
> very large anyway.

Well, having several tens of millions of files in 64k dirs takes from tens of
seconds to minutes to read just because of that.

> But if it does, go on, try FIEMAP and blockdev reading, you know you
> want to :-)

Well, it requires substantial underlying fs knowledge and is not simple
and, well, appropriate to do in some cases.

--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Evgeniy Polyakov wrote:
> On Wed, Apr 21, 2010 at 05:12:11PM +0100, Jamie Lokier (jamie(a)shareable.org) wrote:
> > A quick skim of fs/{ext3,ext4}/dir.c finds a call to
> > page_cache_sync_readahead. Doesn't that do any reading ahead? :-)
>
> It goes down to fs callbacks of data reading, which is not appliable to
> directories.
>
> To implement directory 'readahead' we use separated thread to call
> readdir(). It is damn slow indeed, but it can populate cache in advance
> of actual data reading. As a higher level crunch there is a 'find'
> running in background.

Fwiw, I found sorting directories by inode and reading them in that
order help to reduce seeks, some 10 years ago. I implemented
something like 'find' which works like that, keeping a queue of
directories to read and things to open/stat, ordered by inode number
seen in d_ino before open/stat and st_ino after. However it did not
try to readahead the blocks inside a directory, or sort operations by
block number. It reduced some 'find'-like operations to about a
quarter of the time on cold cache. I still use that program sometimes
before "git status" ;-) Google "treescan" and "lokier" if you're
interested in trying it (though I use 0.7 which isn't published).

> > > I don't actually care to have the content s of the
> > > directories returned, so readdir() does more than I need in that
> > > respect, and also it performs a blocking read of one disk block at a
> > > time, which is horribly slow with a cold cache.
> >
> > I/O is the probably the biggest cost, so it's more important to get
> > the I/O pattern you want than worrying about return values you'll discard.
> >
> > If readdir() calls are slowed by lots of calls and libc, consider
> > using the getdirentries system call directly.
>
> it is not about readdir(). Plain read() is synchronous too. But
> filesystem can respond to readahead calls and read next block to current
> one, while it won't do this for next direntry.

I'm surprised it makes much difference, as directories are usually not
very large anyway.

But if it does, go on, try FIEMAP and blockdev reading, you know you
want to :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Evgeniy Polyakov on
On Wed, Apr 21, 2010 at 05:12:11PM +0100, Jamie Lokier (jamie(a)shareable.org) wrote:
> A quick skim of fs/{ext3,ext4}/dir.c finds a call to
> page_cache_sync_readahead. Doesn't that do any reading ahead? :-)

It goes down to fs callbacks of data reading, which is not appliable to
directories.

To implement directory 'readahead' we use separated thread to call
readdir(). It is damn slow indeed, but it can populate cache in advance
of actual data reading. As a higher level crunch there is a 'find'
running in background.

> > I don't actually care to have the contents of the
> > directories returned, so readdir() does more than I need in that
> > respect, and also it performs a blocking read of one disk block at a
> > time, which is horribly slow with a cold cache.
>
> I/O is the probably the biggest cost, so it's more important to get
> the I/O pattern you want than worrying about return values you'll discard.
>
> If readdir() calls are slowed by lots of calls and libc, consider
> using the getdirentries system call directly.

it is not about readdir(). Plain read() is synchronous too. But
filesystem can respond to readahead calls and read next block to current
one, while it won't do this for next direntry.

--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/