From: Mel Gorman on
On Mon, Jul 19, 2010 at 10:23:49AM -0400, Christoph Hellwig wrote:
> On Mon, Jul 19, 2010 at 02:11:30PM +0100, Mel Gorman wrote:
> > + /*
> > + * If reclaim is encountering dirty pages, it may be because
> > + * dirty pages are reaching the end of the LRU even though
> > + * the dirty_ratio may be satisified. In this case, wake
> > + * flusher threads to pro-actively clean some pages
> > + */
> > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty + nr_dirty / 2);
> > +
>
> Where is the laptop-mode magic coming from?
>

It comes from other parts of page reclaim where writing pages is avoided
by page reclaim where possible. Things like this

wakeup_flusher_threads(laptop_mode ? 0 : total_scanned);

and

.may_writepage = !laptop_mode

although the latter can get disabled too. Deleting the magic is an
option which would trade IO efficiency for power efficiency but my
current thinking is laptop mode preferred reduced power.

> And btw, at least currently wakeup_flusher_threads writes back nr_pages
> for each BDI, which might not be what you want.

I saw you pointing that out in another thread all right although I can't
remember the context. It's not exactly what I want but then again we
really want writing back of pages from a particular zone which we don't
get either. There did not seem to be an ideal here and this appeared to
be "less bad" than the alternatives.

> Then again probably
> no caller wants it, but I don't see an easy way to fix it.
>

I didn't either but my writeback-foo is weak (getting better but still weak). I
hoped to bring it up at MM Summit and maybe at the Filesystem Summit too to
see what ideas exist to improve this.

When this idea was first floated, you called it a band-aid and I
prioritised writing back old inodes over this. How do you feel about
this approach now?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Jul 20, 2010 at 12:48:39AM +0200, Johannes Weiner wrote:
> On Mon, Jul 19, 2010 at 03:37:37PM +0100, Mel Gorman wrote:
> > On Mon, Jul 19, 2010 at 10:23:49AM -0400, Christoph Hellwig wrote:
> > > On Mon, Jul 19, 2010 at 02:11:30PM +0100, Mel Gorman wrote:
> > > > + /*
> > > > + * If reclaim is encountering dirty pages, it may be because
> > > > + * dirty pages are reaching the end of the LRU even though
> > > > + * the dirty_ratio may be satisified. In this case, wake
> > > > + * flusher threads to pro-actively clean some pages
> > > > + */
> > > > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty + nr_dirty / 2);
> > > > +
> > >
> > > Where is the laptop-mode magic coming from?
> > >
> >
> > It comes from other parts of page reclaim where writing pages is avoided
> > by page reclaim where possible. Things like this
> >
> > wakeup_flusher_threads(laptop_mode ? 0 : total_scanned);
>
> Actually, it's not avoiding writing pages in laptop mode, instead it
> is lumping writeouts aggressively (as I wrote in my other mail,
> .nr_pages=0 means 'write everything') to keep disk spinups rare and
> make maximum use of them.
>

You're right, 0 does mean flush everything - /me slaps self. It was introduced
in 2.6.6 with the patch "[PATCH] laptop mode". Quoting from it

Algorithm: the idea is to hold dirty data in memory for a long time,
but to flush everything which has been accumulated if the disk happens
to spin up for other reasons.

So, the reason for the magic is half right - avoid excessive disk spin-ups
but my reasoning for it was wrong. I thought it was avoiding a cleaning to
save power. What it is actually intended to do is "if we are spinning up the
disk anyway, do as much work as possible so it can spin down for longer later".

Where it's wrong is that it should only wakeup flusher threads if dirty
pages were encountered. What it's doing right now is potentially
cleaning everything. It means I need to rerun all the tests and see if
the number of pages encountered by page reclaim is really reduced or was
it because I was calling wakeup_flusher_threads(0) when no dirty pages
were encountered.

> > although the latter can get disabled too. Deleting the magic is an
> > option which would trade IO efficiency for power efficiency but my
> > current thinking is laptop mode preferred reduced power.
>
> Maybe couple your wakeup with sc->may_writepage? It is usually false
> for laptop_mode but direct reclaimers enable it at one point in
> do_try_to_free_pages() when it scanned more than 150% of the reclaim
> target, so you could use existing disk spin-up points instead of
> introducing new ones or disabling the heuristics in laptop mode.
>

How about the following?

if (nr_dirty && sc->may_writepage)
wakeup_flusher_threads(laptop_mode ? 0 :
nr_dirty + nr_dirty / 2);


1. Wakup flusher threads if dirty pages are encountered
2. For direct reclaim, only wake them up if may_writepage is set
indicating that the system is ready to spin up disks and start
reclaiming
3. In laptop_mode, flush everything to reduce future spin-ups

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Mon, Jul 26, 2010 at 03:28:32PM +0800, Wu Fengguang wrote:
> On Mon, Jul 19, 2010 at 09:11:30PM +0800, Mel Gorman wrote:
> > There are a number of cases where pages get cleaned but two of concern
> > to this patch are;
> > o When dirtying pages, processes may be throttled to clean pages if
> > dirty_ratio is not met.
> > o Pages belonging to inodes dirtied longer than
> > dirty_writeback_centisecs get cleaned.
> >
> > The problem for reclaim is that dirty pages can reach the end of the LRU
> > if pages are being dirtied slowly so that neither the throttling cleans
> > them or a flusher thread waking periodically.
> >
> > Background flush is already cleaning old or expired inodes first but the
> > expire time is too far in the future at the time of page reclaim. To mitigate
> > future problems, this patch wakes flusher threads to clean 1.5 times the
> > number of dirty pages encountered by reclaimers. The reasoning is that pages
> > were being dirtied at a roughly constant rate recently so if N dirty pages
> > were encountered in this scan block, we are likely to see roughly N dirty
> > pages again soon so try keep the flusher threads ahead of reclaim.
> >
> > This is unfortunately very hand-wavy but there is not really a good way of
> > quantifying how bad it is when reclaim encounters dirty pages other than
> > "down with that sort of thing". Similarly, there is not an obvious way of
> > figuring how what percentage of dirty pages are old in terms of LRU-age and
> > should be cleaned. Ideally, the background flushers would only be cleaning
> > pages belonging to the zone being scanned but it's not clear if this would
> > be of benefit (less IO) or not (potentially less efficient IO if an inode
> > is scattered across multiple zones).
> >
> > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> > ---
> > mm/vmscan.c | 18 +++++++++++-------
> > 1 files changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bc50937..5763719 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -806,6 +806,8 @@ restart_dirty:
> > }
> >
> > if (PageDirty(page)) {
> > + nr_dirty++;
> > +
> > /*
> > * If the caller cannot writeback pages, dirty pages
> > * are put on a separate list for cleaning by either
> > @@ -814,7 +816,6 @@ restart_dirty:
> > if (!reclaim_can_writeback(sc, page)) {
> > list_add(&page->lru, &dirty_pages);
> > unlock_page(page);
> > - nr_dirty++;
> > goto keep_dirty;
> > }
> >
> > @@ -933,13 +934,16 @@ keep_dirty:
> > VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
> > }
> >
> > + /*
> > + * If reclaim is encountering dirty pages, it may be because
> > + * dirty pages are reaching the end of the LRU even though
> > + * the dirty_ratio may be satisified. In this case, wake
> > + * flusher threads to pro-actively clean some pages
> > + */
> > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty + nr_dirty / 2);
>
> Ah it's very possible that nr_dirty==0 here! Then you are hitting the
> number of dirty pages down to 0 whether or not pageout() is called.
>

True, this has been fixed to only wakeup flusher threads when this is
the file LRU, dirty pages have been encountered and the caller has
sc->may_writepage.

> Another minor issue is, the passed (nr_dirty + nr_dirty / 2) is
> normally a small number, much smaller than MAX_WRITEBACK_PAGES.
> The flusher will sync at least MAX_WRITEBACK_PAGES pages, this is good
> for efficiency.
> And it seems good to let the flusher write much more
> than nr_dirty pages to safeguard a reasonable large
> vmscan-head-to-first-dirty-LRU-page margin. So it would be enough to
> update the comments.
>

Ok, the reasoning had been to flush a number of pages that was related
to the scanning rate but if that is inefficient for the flusher, I'll
use MAX_WRITEBACK_PAGES.

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Mon, Jul 26, 2010 at 07:27:09PM +0800, Wu Fengguang wrote:
> > > > @@ -933,13 +934,16 @@ keep_dirty:
> > > > VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
> > > > }
> > > >
> > > > + /*
> > > > + * If reclaim is encountering dirty pages, it may be because
> > > > + * dirty pages are reaching the end of the LRU even though
> > > > + * the dirty_ratio may be satisified. In this case, wake
> > > > + * flusher threads to pro-actively clean some pages
> > > > + */
> > > > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty + nr_dirty / 2);
> > >
> > > Ah it's very possible that nr_dirty==0 here! Then you are hitting the
> > > number of dirty pages down to 0 whether or not pageout() is called.
> > >
> >
> > True, this has been fixed to only wakeup flusher threads when this is
> > the file LRU, dirty pages have been encountered and the caller has
> > sc->may_writepage.
>
> OK.
>
> > > Another minor issue is, the passed (nr_dirty + nr_dirty / 2) is
> > > normally a small number, much smaller than MAX_WRITEBACK_PAGES.
> > > The flusher will sync at least MAX_WRITEBACK_PAGES pages, this is good
> > > for efficiency.
> > > And it seems good to let the flusher write much more
> > > than nr_dirty pages to safeguard a reasonable large
> > > vmscan-head-to-first-dirty-LRU-page margin. So it would be enough to
> > > update the comments.
> > >
> >
> > Ok, the reasoning had been to flush a number of pages that was related
> > to the scanning rate but if that is inefficient for the flusher, I'll
> > use MAX_WRITEBACK_PAGES.
>
> It would be better to pass something like (nr_dirty * N).
> MAX_WRITEBACK_PAGES may be increased to 128MB in the future, which is
> obviously too large as a parameter. When the batch size is increased
> to 128MB, the writeback code may be improved somehow to not exceed the
> nr_pages limit too much.
>

What might be a useful value for N? 1.5 appears to work reasonably well
to create a window of writeback ahead of the scanner but it's a bit
arbitrary.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Mon, Jul 26, 2010 at 09:10:08PM +0800, Wu Fengguang wrote:
> On Mon, Jul 26, 2010 at 08:57:17PM +0800, Mel Gorman wrote:
> > On Mon, Jul 26, 2010 at 07:27:09PM +0800, Wu Fengguang wrote:
> > > > > > @@ -933,13 +934,16 @@ keep_dirty:
> > > > > > VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * If reclaim is encountering dirty pages, it may be because
> > > > > > + * dirty pages are reaching the end of the LRU even though
> > > > > > + * the dirty_ratio may be satisified. In this case, wake
> > > > > > + * flusher threads to pro-actively clean some pages
> > > > > > + */
> > > > > > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty + nr_dirty / 2);
> > > > >
> > > > > Ah it's very possible that nr_dirty==0 here! Then you are hitting the
> > > > > number of dirty pages down to 0 whether or not pageout() is called.
> > > > >
> > > >
> > > > True, this has been fixed to only wakeup flusher threads when this is
> > > > the file LRU, dirty pages have been encountered and the caller has
> > > > sc->may_writepage.
> > >
> > > OK.
> > >
> > > > > Another minor issue is, the passed (nr_dirty + nr_dirty / 2) is
> > > > > normally a small number, much smaller than MAX_WRITEBACK_PAGES.
> > > > > The flusher will sync at least MAX_WRITEBACK_PAGES pages, this is good
> > > > > for efficiency.
> > > > > And it seems good to let the flusher write much more
> > > > > than nr_dirty pages to safeguard a reasonable large
> > > > > vmscan-head-to-first-dirty-LRU-page margin. So it would be enough to
> > > > > update the comments.
> > > > >
> > > >
> > > > Ok, the reasoning had been to flush a number of pages that was related
> > > > to the scanning rate but if that is inefficient for the flusher, I'll
> > > > use MAX_WRITEBACK_PAGES.
> > >
> > > It would be better to pass something like (nr_dirty * N).
> > > MAX_WRITEBACK_PAGES may be increased to 128MB in the future, which is
> > > obviously too large as a parameter. When the batch size is increased
> > > to 128MB, the writeback code may be improved somehow to not exceed the
> > > nr_pages limit too much.
> > >
> >
> > What might be a useful value for N? 1.5 appears to work reasonably well
> > to create a window of writeback ahead of the scanner but it's a bit
> > arbitrary.
>
> I'd recommend N to be a large value. It's no longer relevant now since
> we'll call the flusher to sync some range containing the target page.
> The flusher will then choose an N large enough (eg. 4MB) for efficient
> IO. It needs to be a large value, otherwise the vmscan code will
> quickly run into dirty pages again..
>

Ok, I took the 4MB at face value to be a "reasonable amount that should
not cause congestion". The end result is

#define MAX_WRITEBACK (4194304UL >> PAGE_SHIFT)
#define WRITEBACK_FACTOR (MAX_WRITEBACK / SWAP_CLUSTER_MAX)
static inline long nr_writeback_pages(unsigned long nr_dirty)
{
return laptop_mode ? 0 :
min(MAX_WRITEBACK, (nr_dirty * WRITEBACK_FACTOR));
}

nr_writeback_pages(nr_dirty) is what gets passed to
wakeup_flusher_threads(). Does that seem sensible?


--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/