From: Mel Gorman on
On Wed, Jul 07, 2010 at 01:03:38PM +0800, Wu Fengguang wrote:
> Hi Mel,
>
> > Second, using systemtap, I was able to see that file-backed dirty
> > pages have a tendency to be near the end of the LRU even though they
> > are a small percentage of the overall pages in the LRU. I'm hoping
> > to figure out why this is as it would make avoiding writeback a lot
> > less controversial.
>
> Your intuitions are correct -- the current background writeback logic
> fails to write elder inodes first. Under heavy loads the background
> writeback job may run for ever, totally ignoring the time order of
> inode->dirtied_when. This is probably why you see lots of dirty pages
> near the end of LRU.
>

Possible. In a mail to Christoph, I asserted that writeback of elder inodes
was happening first but I obviously could be mistaken.

> Here is an old patch for fixing this. Sorry for being late. I'll
> pick up and refresh the patch series ASAP. (I made a mistake last
> year to post too many patches at one time. I'll break them up into
> more manageable pieces.)
>
> [PATCH 31/45] writeback: sync old inodes first in background writeback
> <https://kerneltrap.org/mailarchive/linux-fsdevel/2009/10/7/6476313>
>

I'll check it out as an alternative to forward-flushing based on the
amount of dirty pages encountered during scanning. Thanks.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 07/07/2010 05:43 AM, Mel Gorman wrote:

> How do you suggest tuning this? The modification I tried was "if N dirty
> pages are found during a SWAP_CLUSTER_MAX scan of pages, assume an average
> dirtying density of at least that during the time those pages were inserted on
> the LRU. In response, ask the flushers to flush 1.5X". This roughly responds
> to the conditions it finds as they are encountered and is based on scanning
> rates instead of time. It seemed like a reasonable option.

Your idea sounds like something we need to have, regardless
of whether or not we fix the flusher to flush older inodes
first (we probably should do that, too).

I believe this for the simple reason that we could have too
many dirty pages in one memory zone, while the flusher's
dirty threshold is system wide.

If we both fix the flusher to flush old inodes first and
kick the flusher from the reclaim code, we should be
golden.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Wed, Jul 07, 2010 at 01:03:38PM +0800, Wu Fengguang wrote:
> Here is an old patch for fixing this. Sorry for being late. I'll
> pick up and refresh the patch series ASAP. (I made a mistake last
> year to post too many patches at one time. I'll break them up into
> more manageable pieces.)

Yes, that would be very welcome. There's a lot of important work
in that series.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> On Tue, Jul 06, 2010 at 04:25:39PM +0100, Mel Gorman wrote:
> > On Tue, Jul 06, 2010 at 08:24:57PM +0900, Minchan Kim wrote:
> > > but it is still problem in case of swap file.
> > > That's because swapout on swapfile cause file system writepage which
> > > makes kernel stack overflow.
> >
> > I don't *think* this is a problem unless I missed where writing out to
> > swap enters teh filesystem code. I'll double check.
>
> It bypasses the fs. On swapon, the blocks are resolved
> (mm/swapfile.c::setup_swap_extents) and then the writeout path uses
> bios directly (mm/page_io.c::swap_writepage).

Yeah, my fault. I did misunderstand this.

Thank you.



>
> (GFP_NOFS still includes __GFP_IO, so allows swapping)
>
> Hannes



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/