vmscan: Do not writeback pages in direct reclaim [Kernel]

Prev: [tip:x86/mce] x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup
Next: [PATCH] drivers/net/e1000/e1000_main.c: Fix message logging defect

From: Andrew Morton on 11 Jun 2010 02:20

On Tue, 8 Jun 2010 10:02:25 +0100 Mel Gorman <mel(a)csn.ul.ie> wrote:

> When memory is under enough pressure, a process may enter direct
> reclaim to free pages in the same manner kswapd does. If a dirty page is
> encountered during the scan, this page is written to backing storage using
> mapping->writepage. This can result in very deep call stacks, particularly
> if the target storage or filesystem are complex. It has already been observed
> on XFS that the stack overflows but the problem is not XFS-specific.
>
> This patch prevents direct reclaim writing back pages by not setting
> may_writepage in scan_control. Instead, dirty pages are placed back on the
> LRU lists for either background writing by the BDI threads or kswapd. If
> in direct lumpy reclaim and dirty pages are encountered, the process will
> kick the background flushter threads before trying again.
>

This wouldn't have worked at all well back in the days when you could
dirty all memory with MAP_SHARED. The balance_dirty_pages() calls on
the fault path will now save us but if for some reason we were ever to
revert those, we'd need to revert this change too, I suspect.

As it stands, it would be wildly incautious to make a change like
this without first working out why we're pulling so many dirty pages
off the LRU tail, and fixing that.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Mel Gorman on 11 Jun 2010 09:00

On Thu, Jun 10, 2010 at 11:17:06PM -0700, Andrew Morton wrote:
> On Tue, 8 Jun 2010 10:02:25 +0100 Mel Gorman <mel(a)csn.ul.ie> wrote:
>
> > When memory is under enough pressure, a process may enter direct
> > reclaim to free pages in the same manner kswapd does. If a dirty page is
> > encountered during the scan, this page is written to backing storage using
> > mapping->writepage. This can result in very deep call stacks, particularly
> > if the target storage or filesystem are complex. It has already been observed
> > on XFS that the stack overflows but the problem is not XFS-specific.
> >
> > This patch prevents direct reclaim writing back pages by not setting
> > may_writepage in scan_control. Instead, dirty pages are placed back on the
> > LRU lists for either background writing by the BDI threads or kswapd. If
> > in direct lumpy reclaim and dirty pages are encountered, the process will
> > kick the background flushter threads before trying again.
> >
>
> This wouldn't have worked at all well back in the days when you could
> dirty all memory with MAP_SHARED.

Yes, it would have been a bucket of fail.

> The balance_dirty_pages() calls on
> the fault path will now save us but if for some reason we were ever to
> revert those, we'd need to revert this change too, I suspect.
>

Quite likely.

> As it stands, it would be wildly incautious to make a change like
> this without first working out why we're pulling so many dirty pages
> off the LRU tail, and fixing that.
>

Ok, I have a series prepared for testing that is in three parts.

Patches 1-4: tracepoints to gather how many dirty pages there really are
being written out on the LRU
Patches 5-10: reduce the stack usage in page reclaim
Patches 9-10: Avoid writing out pages from direct reclaim and instead
kicking background flushers to do the writing

Patches 1-4 on its own should an accurate view of how many dirty pages are
really being written back and if it's a real problem or not.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 11 Jun 2010 12:30

On Thu, Jun 10, 2010 at 11:17:06PM -0700, Andrew Morton wrote:
> As it stands, it would be wildly incautious to make a change like
> this without first working out why we're pulling so many dirty pages
> off the LRU tail, and fixing that.

Note that unlike the writepage vs writepages from kswapd which can
be fixed by the right tuning this is a black or white issue. Writeback
from direct reclaim will kill your stack if the caller happens to be
the wrong one, and just making it happen less often is not a fix - it
must not happen at all.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 11 Jun 2010 13:50

On Fri, Jun 11, 2010 at 10:43:31AM -0700, Andrew Morton wrote:
> Of course, but making a change like that in the current VM will cause a
> large number of dirty pages to get refiled, so the impact of this
> change on some workloads could be quite bad.

Note that ext4, btrfs and xfs all error out on ->writepage from reclaim
context. That is both kswapd and direct reclaim because there is no way
to distinguish between the two. Things seem to work fine with these
filesystems, so the issue can't be _that_ bad. Of course reducing this
to just error out from direct reclaim, and fixing them VM to better
cope with it is even better.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 11 Jun 2010 13:50

On Fri, 11 Jun 2010 12:25:23 -0400 Christoph Hellwig <hch(a)infradead.org> wrote:

> On Thu, Jun 10, 2010 at 11:17:06PM -0700, Andrew Morton wrote:
> > As it stands, it would be wildly incautious to make a change like
> > this without first working out why we're pulling so many dirty pages
> > off the LRU tail, and fixing that.
>
> Note that unlike the writepage vs writepages from kswapd which can
> be fixed by the right tuning this is a black or white issue. Writeback
> from direct reclaim will kill your stack if the caller happens to be
> the wrong one, and just making it happen less often is not a fix - it
> must not happen at all.

Of course, but making a change like that in the current VM will cause a
large number of dirty pages to get refiled, so the impact of this
change on some workloads could be quite bad.

If, however, we can get things back to the state where few dirty pages
ever reach the tail of the LRU then the adverse impact of this change
will be much less.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: [tip:x86/mce] x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup
Next: [PATCH] drivers/net/e1000/e1000_main.c: Fix message logging defect