vmscan: delegate pageout io to flusher thread if current is kswapd [Kernel]

Prev: integrating KDB with linux kernel
Next: perf & kvm: Enhance perf to collect KVM guest os statistics from host side

From: KOSAKI Motohiro on 15 Apr 2010 04:20

>
> On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote:
>
> > Now, vmscan pageout() is one of IO throuput degression source.
> > Some IO workload makes very much order-0 allocation and reclaim
> > and pageout's 4K IOs are making annoying lots seeks.
> >
> > At least, kswapd can avoid such pageout() because kswapd don't
> > need to consider OOM-Killer situation. that's no risk.
> >
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
>
> What's your opinion on trying to cluster the writes done by pageout,
> instead of not doing any paging out in kswapd?
> Something along these lines:

Interesting.
So, I'd like to review your patch carefully. can you please give me one
day? :)

>
> Cluster writes to disk due to memory pressure.
>
> Write out logically adjacent pages to the one we're paging out
> so that we may get better IOs in these situations:
> These pages are likely to be contiguous on disk to the one we're
> writing out, so they should get merged into a single disk IO.
>
> Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Suleiman Souhlal on 15 Apr 2010 04:20

On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote:

> Now, vmscan pageout() is one of IO throuput degression source.
> Some IO workload makes very much order-0 allocation and reclaim
> and pageout's 4K IOs are making annoying lots seeks.
>
> At least, kswapd can avoid such pageout() because kswapd don't
> need to consider OOM-Killer situation. that's no risk.
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>

What's your opinion on trying to cluster the writes done by pageout,
instead of not doing any paging out in kswapd?
Something along these lines:

Cluster writes to disk due to memory pressure.

Write out logically adjacent pages to the one we're paging out
so that we may get better IOs in these situations:
These pages are likely to be contiguous on disk to the one we're
writing out, so they should get merged into a single disk IO.

Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c26986c..4e5a613 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -48,6 +48,8 @@

#include "internal.h"

+#define PAGEOUT_CLUSTER_PAGES 16
+
struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
unsigned long nr_scanned;
@@ -350,6 +352,8 @@ typedef enum {
static pageout_t pageout(struct page *page, struct address_space
*mapping,
enum pageout_io sync_writeback)
{
+ int i;
+
/*
* If the page is dirty, only perform writeback if that write
* will be non-blocking. To prevent this allocation from being
@@ -408,6 +412,37 @@ static pageout_t pageout(struct page *page,
struct address_space *mapping,
}

/*
+ * Try to write out logically adjacent dirty pages too, if
+ * possible, to get better IOs, as the IO scheduler should
+ * merge them with the original one, if the file is not too
+ * fragmented.
+ */
+ for (i = 1; i < PAGEOUT_CLUSTER_PAGES; i++) {
+ struct page *p2;
+ int err;
+
+ p2 = find_get_page(mapping, page->index + i);
+ if (p2) {
+ if (trylock_page(p2) == 0) {
+ page_cache_release(p2);
+ break;
+ }
+ if (page_mapped(p2))
+ try_to_unmap(p2, 0);
+ if (PageDirty(p2)) {
+ err = write_one_page(p2, 0);
+ page_cache_release(p2);
+ if (err)
+ break;
+ } else {
+ unlock_page(p2);
+ page_cache_release(p2);
+ break;
+ }
+ }
+ }
+
+ /*
* Wait on writeback if requested to. This happens when
* direct reclaiming a large contiguous area and the
* first attempt to free a range of pages fails.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KOSAKI Motohiro on 15 Apr 2010 04:30

Cc to Johannes

> >
> > On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote:
> >
> > > Now, vmscan pageout() is one of IO throuput degression source.
> > > Some IO workload makes very much order-0 allocation and reclaim
> > > and pageout's 4K IOs are making annoying lots seeks.
> > >
> > > At least, kswapd can avoid such pageout() because kswapd don't
> > > need to consider OOM-Killer situation. that's no risk.
> > >
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
> >
> > What's your opinion on trying to cluster the writes done by pageout,
> > instead of not doing any paging out in kswapd?
> > Something along these lines:
>
> Interesting.
> So, I'd like to review your patch carefully. can you please give me one
> day? :)

Hannes, if my remember is correct, you tried similar swap-cluster IO
long time ago. now I can't remember why we didn't merged such patch.
Do you remember anything?

>
>
> >
> > Cluster writes to disk due to memory pressure.
> >
> > Write out logically adjacent pages to the one we're paging out
> > so that we may get better IOs in these situations:
> > These pages are likely to be contiguous on disk to the one we're
> > writing out, so they should get merged into a single disk IO.
> >
> > Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: KOSAKI Motohiro on 15 Apr 2010 05:50

> On Thu, Apr 15, 2010 at 01:05:57AM -0700, Suleiman Souhlal wrote:
> >
> > On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote:
> >
> > >Now, vmscan pageout() is one of IO throuput degression source.
> > >Some IO workload makes very much order-0 allocation and reclaim
> > >and pageout's 4K IOs are making annoying lots seeks.
> > >
> > >At least, kswapd can avoid such pageout() because kswapd don't
> > >need to consider OOM-Killer situation. that's no risk.
> > >
> > >Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
> >
> > What's your opinion on trying to cluster the writes done by pageout,
> > instead of not doing any paging out in kswapd?
>
> XFS already does this in ->writepage to try to minimise the impact
> of the way pageout issues IO. It helps, but it is still not as good
> as having all the writeback come from the flusher threads because
> it's still pretty much random IO.

I havent review such patch yet. then, I'm talking about generic thing.
pageout() doesn't only writeout file backed page, but also write
swap backed page. so, filesystem optimization nor flusher thread
doesn't erase pageout clusterring worth.

> And, FWIW, it doesn't solve the stack usage problems, either. In
> fact, it will make them worse as write_one_page() puts another
> struct writeback_control on the stack...

Correct. we need to avoid double writeback_control on stack.
probably, we need to divide pageout() some piece.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Johannes Weiner on 15 Apr 2010 06:40

On Thu, Apr 15, 2010 at 05:26:27PM +0900, KOSAKI Motohiro wrote:
> Cc to Johannes
>
> > >
> > > On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote:
> > >
> > > > Now, vmscan pageout() is one of IO throuput degression source.
> > > > Some IO workload makes very much order-0 allocation and reclaim
> > > > and pageout's 4K IOs are making annoying lots seeks.
> > > >
> > > > At least, kswapd can avoid such pageout() because kswapd don't
> > > > need to consider OOM-Killer situation. that's no risk.
> > > >
> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com>
> > >
> > > What's your opinion on trying to cluster the writes done by pageout,
> > > instead of not doing any paging out in kswapd?
> > > Something along these lines:
> >
> > Interesting.
> > So, I'd like to review your patch carefully. can you please give me one
> > day? :)
>
> Hannes, if my remember is correct, you tried similar swap-cluster IO
> long time ago. now I can't remember why we didn't merged such patch.
> Do you remember anything?

Oh, quite vividly in fact :) For a lot of swap loads the LRU order
diverged heavily from swap slot order and readaround was a waste of
time.

Of course, the patch looked good, too, but it did not match reality
that well.

I guess 'how about this patch?' won't get us as far as 'how about
those numbers/graphs of several real-life workloads? oh and here
is the patch...'.

> > > Cluster writes to disk due to memory pressure.
> > >
> > > Write out logically adjacent pages to the one we're paging out
> > > so that we may get better IOs in these situations:
> > > These pages are likely to be contiguous on disk to the one we're
> > > writing out, so they should get merged into a single disk IO.
> > >
> > > Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>

For random IO, LRU order will have nothing to do with mapping/disk order.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: integrating KDB with linux kernel
Next: perf & kvm: Enhance perf to collect KVM guest os statistics from host side