From: Mel Gorman on
On Tue, Jun 15, 2010 at 09:37:27AM -0400, Christoph Hellwig wrote:
> On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
> > If direct reclaim can overflow the stack, so can direct
> > memcg reclaim. That means this patch does not solve the
> > stack overflow, while admitting that we do need the
> > ability to get specific pages flushed to disk from the
> > pageout code.
>
> Can you explain what the hell memcg reclaim is and why it needs
> to reclaim from random contexts?

Kamezawa Hiroyuki has the full story here but here is a summary.

memcg is the Memory Controller cgroup
(Documentation/cgroups/memory.txt). It's intended for the control of the
amount of memory usable by a group of processes but its behaviour in
terms of reclaim differs from global reclaim. It has its own LRU lists
and kswapd operates on them. What is surprising is that direct reclaim
for a process in the control group also does not operate within the
cgroup.

Reclaim from a cgroup happens from the fault path. The new page is
"charged" to the cgroup. If it exceeds its allocated resources, some
pages within the group are reclaimed in a path that is similar to direct
reclaim except for its entry point.

So, memcg is not reclaiming from a random context, there is a limited
number of cases where a memcg is reclaiming and it is not expected to
overflow the stack.

> It seems everything that has a cg in it's name that I stumbled over
> lately seems to be some ugly wart..
>

The wart in this case is that the behaviour of page reclaim within a
memcg and globally differ a fair bit.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
> On 06/15/2010 07:45 AM, Mel Gorman wrote:
>> On Mon, Jun 14, 2010 at 05:55:51PM -0400, Rik van Riel wrote:
>>> On 06/14/2010 07:17 AM, Mel Gorman wrote:
>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 4856a2a..574e816 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -372,6 +372,12 @@ int write_reclaim_page(struct page *page, struct address_space *mapping,
>>>> return PAGE_SUCCESS;
>>>> }
>>>>
>>>> +/* kswapd and memcg can writeback as they are unlikely to overflow stack */
>>>> +static inline bool reclaim_can_writeback(struct scan_control *sc)
>>>> +{
>>>> + return current_is_kswapd() || sc->mem_cgroup != NULL;
>>>> +}
>>>> +
>>>
>>> I'm not entirely convinced on this bit, but am willing to
>>> be convinced by the data.
>>>
>>
>> Which bit?
>>
>> You're not convinced that kswapd should be allowed to write back?
>> You're not convinced that memcg should be allowed to write back?
>> You're not convinced that direct reclaim writing back pages can overflow
>> the stack?
>
> If direct reclaim can overflow the stack, so can direct
> memcg reclaim. That means this patch does not solve the
> stack overflow, while admitting that we do need the
> ability to get specific pages flushed to disk from the
> pageout code.
>

What path is taken with memcg != NULL that could overflow the stack? I
couldn't spot one but mm/memcontrol.c is a bit tangled so finding all
its use cases is tricky. The critical path I had in mind though was
direct reclaim and for that path, memcg == NULL or did I miss something?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 06/15/2010 09:59 AM, Mel Gorman wrote:
> On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
>> On 06/15/2010 07:45 AM, Mel Gorman wrote:

>>>>>
>>>>> +/* kswapd and memcg can writeback as they are unlikely to overflow stack */
>>>>> +static inline bool reclaim_can_writeback(struct scan_control *sc)
>>>>> +{
>>>>> + return current_is_kswapd() || sc->mem_cgroup != NULL;
>>>>> +}

>> If direct reclaim can overflow the stack, so can direct
>> memcg reclaim. That means this patch does not solve the
>> stack overflow, while admitting that we do need the
>> ability to get specific pages flushed to disk from the
>> pageout code.
>>
>
> What path is taken with memcg != NULL that could overflow the stack? I
> couldn't spot one but mm/memcontrol.c is a bit tangled so finding all
> its use cases is tricky. The critical path I had in mind though was
> direct reclaim and for that path, memcg == NULL or did I miss something?

mem_cgroup_hierarchical_reclaim -> try_to_free_mem_cgroup_pages

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Jun 15, 2010 at 10:04:24AM -0400, Rik van Riel wrote:
> On 06/15/2010 09:59 AM, Mel Gorman wrote:
>> On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
>>> On 06/15/2010 07:45 AM, Mel Gorman wrote:
>
>>>>>>
>>>>>> +/* kswapd and memcg can writeback as they are unlikely to overflow stack */
>>>>>> +static inline bool reclaim_can_writeback(struct scan_control *sc)
>>>>>> +{
>>>>>> + return current_is_kswapd() || sc->mem_cgroup != NULL;
>>>>>> +}
>
>>> If direct reclaim can overflow the stack, so can direct
>>> memcg reclaim. That means this patch does not solve the
>>> stack overflow, while admitting that we do need the
>>> ability to get specific pages flushed to disk from the
>>> pageout code.
>>>
>>
>> What path is taken with memcg != NULL that could overflow the stack? I
>> couldn't spot one but mm/memcontrol.c is a bit tangled so finding all
>> its use cases is tricky. The critical path I had in mind though was
>> direct reclaim and for that path, memcg == NULL or did I miss something?
>
> mem_cgroup_hierarchical_reclaim -> try_to_free_mem_cgroup_pages
>

But in turn, where is mem_cgroup_hierarchical_reclaim called from direct
reclaim? It appears to be only called from the fault path or as a result
of the memcg changing size.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel on
On 06/15/2010 09:37 AM, Christoph Hellwig wrote:
> On Tue, Jun 15, 2010 at 09:34:18AM -0400, Rik van Riel wrote:
>> If direct reclaim can overflow the stack, so can direct
>> memcg reclaim. That means this patch does not solve the
>> stack overflow, while admitting that we do need the
>> ability to get specific pages flushed to disk from the
>> pageout code.
>
> Can you explain what the hell memcg reclaim is and why it needs
> to reclaim from random contexts?

The page fault code will call the cgroup accounting code.

When a cgroup goes over its memory limit, __mem_cgroup_try_charge
will call mem_cgroup_hierarchical_reclaim, which will then go
into the page reclaim code.

> It seems everything that has a cg in it's name that I stumbled over
> lately seems to be some ugly wart..

No argument there. It took me a few minutes to find the code
path above :)

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/