vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages [Kernel]

Prev: vmscan: Do not writeback filesystem pages in direct reclaim
Next: vmscan: tracing: Update trace event to track if page reclaim IO is for anon or file pages

From: Wu Fengguang on 1 Aug 2010 09:10

On Sun, Aug 01, 2010 at 07:56:40PM +0800, Wu Fengguang wrote:
> > Sigh. We have sooo many problems with writeback and latency. Read
> > https://bugzilla.kernel.org/show_bug.cgi?id=12309 and weep. Everyone's
> > running away from the issue and here we are adding code to solve some
> > alleged stack-overflow problem which seems to be largely a non-problem,
> > by making changes which may worsen our real problems.
>
> I'm sweeping bug 12309. Most people reports some data writes, though
> relative few explicitly stated memory pressure is another necessary
> condition.

#14: Per von Zweigbergk
Ubuntu 2.6.27 slowdown when copying 25MB/s USB stick to 10 MB/s SSD.

KOSAKI and my patches won't fix 2.6.27, since it only do
congestion_wait() and wait_on_page_writeback() for order>3
allocations. There may be more bugs there.

#24: Per von Zweigbergk
The encryption of the SSD very significantly increases the problem.

This is expected. Data encryption roughly doubles page consumption
speed (there may be temp buffers allocated/dropped quickly), hence
vmscan pressure.

#26: Per von Zweigbergk
Disabling swap makes the terminal launch much faster while copying;
However Firefox and vim hang much more aggressively and frequently
during copying.

It's interesting to see processes behave differently. Is this
reproducible at all?

#34: Ben Gamari
There is evidence that x86-64 is a factor here.

Because x86-64 does order-1 page allocation in fork() and consumes
more memory (larger user space code/data)?

#36: Lari Temmes
Go from usable to totally unusable when switching from
a SMP kernel to a UP kernel on a single CPU laptop

He should be testing 2.6.28. I'm not aware of known bugs there.

#47: xyke
Renicing pdflush -10 had some great improvement on basic
responsiveness.

It sure helps :)

Too much (old) messages there. I'm hoping some of the still active
bug reporters to test the following patches (they are for the -mmotm
tree, need to unindent code for Linus's tree) and see if there are
any improvements.

http://lkml.org/lkml/2010/8/1/40
http://lkml.org/lkml/2010/8/1/45

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Trond Myklebust on 1 Aug 2010 12:30

On Sun, 2010-08-01 at 17:19 +0900, KOSAKI Motohiro wrote:
> Hi Trond,
>
> > There is that, and then there are issues with the VM simply lying to the
> > filesystems.
> >
> > See https://bugzilla.kernel.org/show_bug.cgi?id=16056
> >
> > Which basically boils down to the following: kswapd tells the filesystem
> > that it is quite safe to do GFP_KERNEL allocations in pageouts and as
> > part of try_to_release_page().
> >
> > In the case of pageouts, it does set the 'WB_SYNC_NONE', 'nonblocking'
> > and 'for_reclaim' flags in the writeback_control struct, and so the
> > filesystem has at least some hint that it should do non-blocking i/o.
> >
> > However if you trust the GFP_KERNEL flag in try_to_release_page() then
> > the kernel can and will deadlock, and so I had to add in a hack
> > specifically to tell the NFS client not to trust that flag if it comes
> > from kswapd.
>
> Can you please elaborate your issue more? vmscan logic is, briefly, below
>
> if (PageDirty(page))
> pageout(page)
> if (page_has_private(page)) {
> try_to_release_page(page, sc->gfp_mask))
>
> So, I'm interest why nfs need to writeback at ->release_page again even
> though pageout() call ->writepage and it was successfull.
>
> In other word, an argument gfp_mask of try_to_release_page() is suspected
> to pass kmalloc()/alloc_page() familiy. and page allocator have already care
> PF_MEMALLOC flag.
>
> So, My question is, What do you want additional work to VM folks?
> Can you please share nfs design and what we should?
>
>
> btw, Another question, Recently, Xiaotian Feng posted "swap over nfs -v21"
> patch series. they have new reservation memory framework. Is this help you?

The problem that I am seeing is that the try_to_release_page() needs to
be told to act as a non-blocking call when the process is kswapd, just
like the pageout() call.

Currently, the sc->gfp_mask is set to GFP_KERNEL, which normally means
that the call may wait on I/O to complete. However, what I'm seeing in
the bugzilla above is that if kswapd waits on an RPC call, then the
whole VM may gum up: typically, the traces show that the socket layer
cannot allocate memory to hold the RPC reply from the server, and so it
is kicking kswapd to have it reclaim some pages, however kswapd is stuck
in try_to_release_page() waiting for that same I/O to complete, hence
the deadlock...

IOW: I think kswapd at least should be calling try_to_release_page()
with a gfp-flag of '0' to avoid deadlocking on I/O.

Cheers
Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 1 Aug 2010 22:40

Hi Sean,

On Mon, Aug 02, 2010 at 10:17:27AM +0800, Sean Jensen-Grey wrote:
> Wu,
>
> Thank you for doing this. This still bites me on a weekly basis. I don't have much time to test the patches this week, but I should get access to an identical box week after next.

That's OK.

> BTW, I experience the issues even with 8-10GB of free ram. I have 12GB currently.

Thanks for the important information. It means the patches proposed
are not likely to help your case.

In Comment #47 for bug 12309, your kernel 2.6.27 is too old though. You may
well benefit from Jens' CFQ low latency improvements if switching to a recent
kernel.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jan Kara on 2 Aug 2010 14:40

On Sat 31-07-10 11:33:22, Mel Gorman wrote:
> On Fri, Jul 30, 2010 at 03:06:01PM -0700, Andrew Morton wrote:
> > Sigh. We have sooo many problems with writeback and latency. Read
> > https://bugzilla.kernel.org/show_bug.cgi?id=12309 and weep.
>
> You aren't joking.
>
> > Everyone's
> > running away from the issue and here we are adding code to solve some
> > alleged stack-overflow problem which seems to be largely a non-problem,
> > by making changes which may worsen our real problems.
> >
>
> As it is, filesystems are beginnning to ignore writeback from direct
> reclaim - such as xfs and btrfs. I'm lead to believe that ext3
> effectively ignores writeback from direct reclaim although I don't have
> access to code at the moment to double check (am on the road). So either
> way, we are going to be facing this problem so the VM might as well be
> aware of it :/
Umm, ext3 should be handling direct reclaim just fine. ext4 does however
ignore it when a page does not have block already allocated (which is a
common case with delayed allocation).

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4
Prev: vmscan: Do not writeback filesystem pages in direct reclaim
Next: vmscan: tracing: Update trace event to track if page reclaim IO is for anon or file pages