From: Minchan Kim on
I missed Hugh.

Minchan Kim wrote:
> Long time ago, we counted zero page as file_rss.
> But after reinstanted zero page, we don't do it.
> It means rss of process would be smaller than old.
>
> It could chage OOM victim selection.
>
> Kame reported following as
> "Before starting zero-page works, I checked "questions" in lkml and
> found some reports that some applications start to go OOM after zero-page
> removal.
>
> For me, I know one of my customer's application depends on behavior of
> zero page (on RHEL5). So, I tried to add again it before RHEL6 because
> I think removal of zero-page corrupts compatibility."
>
> So how about adding zero page as file_rss again for compatibility?
>
> Signed-off-by: Minchan Kim <minchan.kim(a)gmail.com>
> ---
> mm/memory.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 3743fb5..a4ba271 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1995,6 +1995,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> int reuse = 0, ret = 0;
> int page_mkwrite = 0;
> struct page *dirty_page = NULL;
> + int zero_pfn = 0;
>
> old_page = vm_normal_page(vma, address, orig_pte);
> if (!old_page) {
> @@ -2117,7 +2118,8 @@ gotten:
> if (unlikely(anon_vma_prepare(vma)))
> goto oom;
>
> - if (is_zero_pfn(pte_pfn(orig_pte))) {
> + zero_pfn = is_zero_pfn(pte_pfn(orig_pte));
> + if (zero_pfn) {
> new_page = alloc_zeroed_user_highpage_movable(vma, address);
> if (!new_page)
> goto oom;
> @@ -2147,7 +2149,7 @@ gotten:
> */
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> if (likely(pte_same(*page_table, orig_pte))) {
> - if (old_page) {
> + if (old_page || zero_pfn) {
> if (!PageAnon(old_page)) {
> dec_mm_counter(mm, file_rss);
> inc_mm_counter(mm, anon_rss);
> @@ -2650,6 +2652,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> spin_lock(ptl);
> if (!pte_none(*page_table))
> goto unlock;
> + inc_mm_counter(mm, file_rss);
> goto setpte;
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Dec 31, 2009 at 1:49 AM, Hugh Dickins
<hugh.dickins(a)tiscali.co.uk> wrote:
> On Mon, 28 Dec 2009, Minchan Kim wrote:
>> I missed Hugh.
>
> Thank you: it is sweet of you to say so :)
>
>>
>> Minchan Kim wrote:
>> > Long time ago, we counted zero page as file_rss.
>> > But after reinstanted zero page, we don't do it.
>> > It means rss of process would be smaller than old.
>> >
>> > It could chage OOM victim selection.
>
> Eh?  We don't use rss for OOM victim selection, we use total_vm.
>
> I know that's under discussion, and there are good arguments on
> both sides (I incline to the rss side, but see David's point about
> predictability); but here you seem to be making an argument for
> back-compatibility, yet there is no such issue in OOM victim selection.

Sorry, I totally confused that.

>
> And if we do decide that rss is appropriate for OOM victim selection,
> then we would prefer to keep the ZERO_PAGE out of rss, wouldn't we?

If we start to use RSS, it's good that keep zero page out of rss in OOM aspect.
But I am not sure it's good in smap aspect.
Some smap user might want to know max memory usage in process.
Zero page has a possibility to change real rss.

>
>> >
>> > Kame reported following as
>> > "Before starting zero-page works, I checked "questions" in lkml and
>> > found some reports that some applications start to go OOM after zero-page
>> > removal.
>> >
>> > For me, I know one of my customer's application depends on behavior of
>> > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
>> > I think removal of zero-page corrupts compatibility."
>> >
>> > So how about adding zero page as file_rss again for compatibility?
>
> I think not.

At least, we had accounted zero page as file_rss until remove zero page.
That was my concern.
I think we have to fix this for above compatibility regardless of OOM issue.

>
> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
> wrong, but I don't think his customer's OOMs had anything to do with
> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
> from the fact that many pages were being used up where just the one
> ZERO_PAGE had been good before.  Wouldn't he have complained if the
> zero_pfn patches hadn't solved that problem?
>
> You are right that I completely overlooked the issue of whether to
> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
> it was just natural to leave it out); and I overlooked the fact that
> it used to be counted into file_rss in the old days (being !PageAnon).
>
> So I'm certainly at fault for that, and thank you for bringing the
> issue to attention; but once considered, I can't actually see a good
> reason why we should add code to count ZERO_PAGEs into file_rss now.
> And if this patch falls, then 1/3 and 3/3 would fall also.
>
> And the patch below would be incomplete anyway, wouldn't it?
> There would need to be a matching change to zap_pte_range(),
> but I don't see that.

Thanks.
If we think this patch is need, I will repost path with fix it.

What do you think?

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Wed, 30 Dec 2009 16:49:52 +0000 (GMT)
Hugh Dickins <hugh.dickins(a)tiscali.co.uk> wrote:

> > >
> > > Kame reported following as
> > > "Before starting zero-page works, I checked "questions" in lkml and
> > > found some reports that some applications start to go OOM after zero-page
> > > removal.
> > >
> > > For me, I know one of my customer's application depends on behavior of
> > > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
> > > I think removal of zero-page corrupts compatibility."
> > >
> > > So how about adding zero page as file_rss again for compatibility?
>
> I think not.
>
> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
> wrong, but I don't think his customer's OOMs had anything to do with
> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
> from the fact that many pages were being used up where just the one
> ZERO_PAGE had been good before. Wouldn't he have complained if the
> zero_pfn patches hadn't solved that problem?
>
> You are right that I completely overlooked the issue of whether to
> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
> it was just natural to leave it out); and I overlooked the fact that
> it used to be counted into file_rss in the old days (being !PageAnon).
>
> So I'm certainly at fault for that, and thank you for bringing the
> issue to attention; but once considered, I can't actually see a good
> reason why we should add code to count ZERO_PAGEs into file_rss now.
> And if this patch falls, then 1/3 and 3/3 would fall also.
>
> And the patch below would be incomplete anyway, wouldn't it?
> There would need to be a matching change to zap_pte_range(),
> but I don't see that.
>
> We really don't want to be adding more and more ZERO_PAGE/zero_pfn
> tests around the place if we can avoid them: KOSAKI-san has a strong
> argument for adding such a test in kernel/futex.c, but I don't the
> argument here.
>

I agree that ZERO_PAGE shouldn't be counted as rss. Now, I feel that old
counting method(in old zero-page implementation) was bad.

Minchan-san, I'm sorry for noise.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Mon, Jan 4, 2010 at 8:43 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> On Wed, 30 Dec 2009 16:49:52 +0000 (GMT)
> Hugh Dickins <hugh.dickins(a)tiscali.co.uk> wrote:
>
>> > >
>> > > Kame reported following as
>> > > "Before starting zero-page works, I checked "questions" in lkml and
>> > > found some reports that some applications start to go OOM after zero-page
>> > > removal.
>> > >
>> > > For me, I know one of my customer's application depends on behavior of
>> > > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
>> > > I think removal of zero-page corrupts compatibility."
>> > >
>> > > So how about adding zero page as file_rss again for compatibility?
>>
>> I think not.
>>
>> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
>> wrong, but I don't think his customer's OOMs had anything to do with
>> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
>> from the fact that many pages were being used up where just the one
>> ZERO_PAGE had been good before.  Wouldn't he have complained if the
>> zero_pfn patches hadn't solved that problem?
>>
>> You are right that I completely overlooked the issue of whether to
>> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
>> it was just natural to leave it out); and I overlooked the fact that
>> it used to be counted into file_rss in the old days (being !PageAnon).
>>
>> So I'm certainly at fault for that, and thank you for bringing the
>> issue to attention; but once considered, I can't actually see a good
>> reason why we should add code to count ZERO_PAGEs into file_rss now.
>> And if this patch falls, then 1/3 and 3/3 would fall also.
>>
>> And the patch below would be incomplete anyway, wouldn't it?
>> There would need to be a matching change to zap_pte_range(),
>> but I don't see that.
>>
>> We really don't want to be adding more and more ZERO_PAGE/zero_pfn
>> tests around the place if we can avoid them: KOSAKI-san has a strong
>> argument for adding such a test in kernel/futex.c, but I don't the
>> argument here.
>>
>
> I agree that ZERO_PAGE shouldn't be counted as rss. Now, I feel that old
> counting method(in old zero-page implementation) was bad.
>
> Minchan-san, I'm sorry for noise.

That's all right.
It was my mistake.

I will drop this and repost Matt and Hugh's ACK version.
Thanks for all. :)

>
> Thanks,
> -Kame
>
>
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Thu, Dec 31, 2009 at 5:43 PM, Hugh Dickins
<hugh.dickins(a)tiscali.co.uk> wrote:
> On Thu, 31 Dec 2009, Minchan Kim wrote:
>> On Thu, Dec 31, 2009 at 1:49 AM, Hugh Dickins
>> <hugh.dickins(a)tiscali.co.uk> wrote:
>> >
>> > You are right that I completely overlooked the issue of whether to
>> > include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
>> > it was just natural to leave it out); and I overlooked the fact that
>> > it used to be counted into file_rss in the old days (being !PageAnon).
>> >
>> > So I'm certainly at fault for that, and thank you for bringing the
>> > issue to attention; but once considered, I can't actually see a good
>> > reason why we should add code to count ZERO_PAGEs into file_rss now.
>> > And if this patch falls, then 1/3 and 3/3 would fall also.
>> >
>> > And the patch below would be incomplete anyway, wouldn't it?
>> > There would need to be a matching change to zap_pte_range(),
>> > but I don't see that.
>>
>> Thanks.
>> If we think this patch is need, I will repost path with fix it.
>>
>> What do you think?
>
> If someone comes up with a convincing case in which their system
> is behaving significantly worse because the zero page is not being
> counted in rss now, then we shall probably want your patch.
> But I still don't yet see a reason to add code just to keep the
> anon_rss+file_rss number looking the same as it was before, in this
> exceptional case where there's a significant number of zero pages.

Okay. I understand your point.
I will repost this patch when we need this. :)
Thanks for careful review, Hugh.

>
> Hugh
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/