From: KAMEZAWA Hiroyuki on
On Tue, 15 Jun 2010 15:16:01 +0100
Mel Gorman <mel(a)csn.ul.ie> wrote:

> On Tue, Jun 15, 2010 at 10:04:24AM -0400, Rik van Riel wrote:
> > On 06/15/2010 09:59 AM, Mel Gorman wrote:
> >> On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
> >>> On 06/15/2010 07:45 AM, Mel Gorman wrote:
> >
> >>>>>>
> >>>>>> +/* kswapd and memcg can writeback as they are unlikely to overflow stack */
> >>>>>> +static inline bool reclaim_can_writeback(struct scan_control *sc)
> >>>>>> +{
> >>>>>> + return current_is_kswapd() || sc->mem_cgroup != NULL;
> >>>>>> +}
> >
> >>> If direct reclaim can overflow the stack, so can direct
> >>> memcg reclaim. That means this patch does not solve the
> >>> stack overflow, while admitting that we do need the
> >>> ability to get specific pages flushed to disk from the
> >>> pageout code.
> >>>
> >>
> >> What path is taken with memcg != NULL that could overflow the stack? I
> >> couldn't spot one but mm/memcontrol.c is a bit tangled so finding all
> >> its use cases is tricky. The critical path I had in mind though was
> >> direct reclaim and for that path, memcg == NULL or did I miss something?
> >
> > mem_cgroup_hierarchical_reclaim -> try_to_free_mem_cgroup_pages
> >
>
> But in turn, where is mem_cgroup_hierarchical_reclaim called from direct
> reclaim? It appears to be only called from the fault path or as a result
> of the memcg changing size.
>
yes. It's only called from
- page fault
- add_to_page_cache()

I think we'll see no stack problem. Now, memcg doesn't wakeup kswapd for
reclaiming memory, it needs direct writeback.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 06/15/2010 08:17 PM, KAMEZAWA Hiroyuki wrote:
> On Tue, 15 Jun 2010 15:16:01 +0100
> Mel Gorman<mel(a)csn.ul.ie> wrote:

>> But in turn, where is mem_cgroup_hierarchical_reclaim called from direct
>> reclaim? It appears to be only called from the fault path or as a result
>> of the memcg changing size.
>>
> yes. It's only called from
> - page fault
> - add_to_page_cache()
>
> I think we'll see no stack problem. Now, memcg doesn't wakeup kswapd for
> reclaiming memory, it needs direct writeback.

Of course, a memcg page fault could still be triggered
from copy_to_user or copy_from_user, with a fairly
arbitrary stack frame above...

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 15 Jun 2010 14:54:08 +0100
Mel Gorman <mel(a)csn.ul.ie> wrote:

> On Tue, Jun 15, 2010 at 09:37:27AM -0400, Christoph Hellwig wrote:
> > On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
> > > If direct reclaim can overflow the stack, so can direct
> > > memcg reclaim. That means this patch does not solve the
> > > stack overflow, while admitting that we do need the
> > > ability to get specific pages flushed to disk from the
> > > pageout code.
> >
> > Can you explain what the hell memcg reclaim is and why it needs
> > to reclaim from random contexts?
>
> Kamezawa Hiroyuki has the full story here but here is a summary.
>
Thank you.

> memcg is the Memory Controller cgroup
> (Documentation/cgroups/memory.txt). It's intended for the control of the
> amount of memory usable by a group of processes but its behaviour in
> terms of reclaim differs from global reclaim. It has its own LRU lists
> and kswapd operates on them.

No, we don't use kswapd. But we have some hooks in kswapd for implementing
soft-limit. Soft-limit is for giving a hint for kswapd "please reclaim memory
from this memcg" when global memory exhausts and kswapd runs.

What a memcg use when it his limit is just direct reclaim.
(*) Justfing using a cpu by a kswapd because a memcg hits limit is difficult
for me. So, I don't use kswapd until now.
When direct-reclaim is used, cost-of-reclaim will be charged against
a cpu cgroup which a thread belongs to.


> What is surprising is that direct reclaim
> for a process in the control group also does not operate within the
> cgroup.
Sorry, I can't understand ....

>
> Reclaim from a cgroup happens from the fault path. The new page is
> "charged" to the cgroup. If it exceeds its allocated resources, some
> pages within the group are reclaimed in a path that is similar to direct
> reclaim except for its entry point.
>
yes.

> So, memcg is not reclaiming from a random context, there is a limited
> number of cases where a memcg is reclaiming and it is not expected to
> overflow the stack.
>

I think so. Especially, we'll never see 1k stack use of select().

> > It seems everything that has a cg in it's name that I stumbled over
> > lately seems to be some ugly wart..
> >
>
> The wart in this case is that the behaviour of page reclaim within a
> memcg and globally differ a fair bit.
>

Sorry. But there has been very long story to reach current implementations.
But don't worry, of memcg is not activated (not mounted), it doesn't affect
the behavior of processes ;)

But Hmm..

>[kamezawa(a)bluextal mmotm-2.6.35-0611]$ wc -l mm/memcontrol.c
>4705 mm/memcontrol.c

may need some diet :(


Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 15 Jun 2010 20:29:49 -0400
Rik van Riel <riel(a)redhat.com> wrote:

> On 06/15/2010 08:17 PM, KAMEZAWA Hiroyuki wrote:
> > On Tue, 15 Jun 2010 15:16:01 +0100
> > Mel Gorman<mel(a)csn.ul.ie> wrote:
>
> >> But in turn, where is mem_cgroup_hierarchical_reclaim called from direct
> >> reclaim? It appears to be only called from the fault path or as a result
> >> of the memcg changing size.
> >>
> > yes. It's only called from
> > - page fault
> > - add_to_page_cache()
> >
> > I think we'll see no stack problem. Now, memcg doesn't wakeup kswapd for
> > reclaiming memory, it needs direct writeback.
>
> Of course, a memcg page fault could still be triggered
> from copy_to_user or copy_from_user, with a fairly
> arbitrary stack frame above...
>

Hmm. But I don't expect copy_from/to_user is called in very deep stack.

Should I prepare a thread for reclaiming memcg pages ?
Because we shouldn't limit kswapd's cpu time by CFS cgroup, waking up
kswapd just because "a memcg hit limits" isn't fun.

Hmm, or do you recommend no-dirty-page-writeback when a memcg hits limit ?
Maybe we'll see much swaps.

I want to go with this for a while, changing memcg's behavior will took
some amounts of time, there are only a few developpers.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 06/15/2010 08:39 PM, KAMEZAWA Hiroyuki wrote:

> Hmm, or do you recommend no-dirty-page-writeback when a memcg hits limit ?
> Maybe we'll see much swaps.
>
> I want to go with this for a while, changing memcg's behavior will took
> some amounts of time, there are only a few developpers.

One thing we can do, for kswapd, memcg and direct reclaim alike,
is to tell the flusher threads to flush pages related to a pageout
candidate page to disk.

That way the reclaiming processes can wait on some disk IO to
finish, while the flusher thread takes care of the actual flushing.

That should also fix the "kswapd filesystem IO has really poor IO
patterns" issue.

There's no reason not to fix this issue the right way.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/