From: KAMEZAWA Hiroyuki on
On Tue, 15 Jun 2010 20:53:43 -0400
Rik van Riel <riel(a)redhat.com> wrote:

> On 06/15/2010 08:39 PM, KAMEZAWA Hiroyuki wrote:
>
> > Hmm, or do you recommend no-dirty-page-writeback when a memcg hits limit ?
> > Maybe we'll see much swaps.
> >
> > I want to go with this for a while, changing memcg's behavior will took
> > some amounts of time, there are only a few developpers.
>
> One thing we can do, for kswapd, memcg and direct reclaim alike,
> is to tell the flusher threads to flush pages related to a pageout
> candidate page to disk.
>
> That way the reclaiming processes can wait on some disk IO to
> finish, while the flusher thread takes care of the actual flushing.
>
> That should also fix the "kswapd filesystem IO has really poor IO
> patterns" issue.
>
> There's no reason not to fix this issue the right way.
>
yes. but this patch just stops writeback. I think it's sane to ask
not to change behavior until there are some useful changes in flusher
threads.

IMO, until flusher threads can work with I/O cgroup, memcg shoudln't
depend on it because writeback allows stealing resource without it.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Wed, 16 Jun 2010 10:40:36 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> On Tue, 15 Jun 2010 20:53:43 -0400
> Rik van Riel <riel(a)redhat.com> wrote:
>
> > On 06/15/2010 08:39 PM, KAMEZAWA Hiroyuki wrote:
> >
> > > Hmm, or do you recommend no-dirty-page-writeback when a memcg hits limit ?
> > > Maybe we'll see much swaps.
> > >
> > > I want to go with this for a while, changing memcg's behavior will took
> > > some amounts of time, there are only a few developpers.
> >
> > One thing we can do, for kswapd, memcg and direct reclaim alike,
> > is to tell the flusher threads to flush pages related to a pageout
> > candidate page to disk.
> >
> > That way the reclaiming processes can wait on some disk IO to
> > finish, while the flusher thread takes care of the actual flushing.
> >
> > That should also fix the "kswapd filesystem IO has really poor IO
> > patterns" issue.
> >
> > There's no reason not to fix this issue the right way.
> >
> yes. but this patch just stops writeback. I think it's sane to ask
> not to change behavior until there are some useful changes in flusher
> threads.
>
> IMO, until flusher threads can work with I/O cgroup, memcg shoudln't
> depend on it because writeback allows stealing resource without it.
>

BTW, copy_from_user/copy_to_user is _real_ problem, I'm afraid following
much more than memcg.

handle_mm_fault()
-> handle_pte_fault()
-> do_wp_page()
-> balance_dirty_page_rate_limited()
-> balance_dirty_pages()
-> writeback_inodes_wbc()
-> writeback_inodes_wb()
-> writeback_sb_inodes()
-> writeback_single_inode()
-> do_writepages()
-> generic_write_pages()
-> write_cache_pages() // use on-stack pagevec.
-> writepage()

maybe much more stack consuming than memcg->writeback after vmscan.c diet.

Bye.
-Kame


















--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Wed, Jun 16, 2010 at 09:17:55AM +0900, KAMEZAWA Hiroyuki wrote:
> yes. It's only called from
> - page fault
> - add_to_page_cache()
>
> I think we'll see no stack problem. Now, memcg doesn't wakeup kswapd for
> reclaiming memory, it needs direct writeback.

The page fault code should be fine, but add_to_page_cache can be called
with quite deep stacks. Two examples are grab_cache_page_write_begin
which already was part of one of the stack overflows mentioned in this
thread, or find_or_create_page which can be called via
_xfs_buf_lookup_pages, which can be called from under the whole XFS
allocator, or via grow_dev_page which might have a similarly deep
stack for users of the normal buffer cache. Although for the
find_or_create_page we usually should not have __GFP_FS set in the
gfp_mask.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Wed, Jun 16, 2010 at 09:39:58AM +0900, KAMEZAWA Hiroyuki wrote:
> Hmm. But I don't expect copy_from/to_user is called in very deep stack.

Actually it is. The poll code mentioned earlier in this thread is just
want nasty example. I'm pretty sure there are tons of others in ioctl
code, as various ioctl implementations have been found to be massive
stack hogs in the past, even worse for out of tree drivers.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Hellwig on
On Wed, Jun 16, 2010 at 11:20:24AM +0900, KAMEZAWA Hiroyuki wrote:
> BTW, copy_from_user/copy_to_user is _real_ problem, I'm afraid following
> much more than memcg.
>
> handle_mm_fault()
> -> handle_pte_fault()
> -> do_wp_page()
> -> balance_dirty_page_rate_limited()
> -> balance_dirty_pages()
> -> writeback_inodes_wbc()
> -> writeback_inodes_wb()
> -> writeback_sb_inodes()
> -> writeback_single_inode()
> -> do_writepages()
> -> generic_write_pages()
> -> write_cache_pages() // use on-stack pagevec.
> -> writepage()

Yes, this is a massive issue. Strangely enough I just wondered about
this callstack as balance_dirty_pages is the only place calling into the
per-bdi/sb writeback code directly instead of offloading it to the
flusher threads. It's something that should be fixed rather quickly
IMHO. write_cache_pages and other bits of this writeback code can use
quite large amounts of stack.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/