From: Balbir Singh on
* KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-03-02 17:23:16]:

> On Tue, 2 Mar 2010 09:01:58 +0100
> Andrea Righi <arighi(a)develer.com> wrote:
>
> > On Tue, Mar 02, 2010 at 09:23:09AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 1 Mar 2010 22:23:40 +0100
> > > Andrea Righi <arighi(a)develer.com> wrote:
> > >
> > > > Apply the cgroup dirty pages accounting and limiting infrastructure to
> > > > the opportune kernel functions.
> > > >
> > > > Signed-off-by: Andrea Righi <arighi(a)develer.com>
> > >
> > > Seems nice.
> > >
> > > Hmm. the last problem is moving account between memcg.
> > >
> > > Right ?
> >
> > Correct. This was actually the last item of the TODO list. Anyway, I'm
> > still considering if it's correct to move dirty pages when a task is
> > migrated from a cgroup to another. Currently, dirty pages just remain in
> > the original cgroup and are flushed depending on the original cgroup
> > settings. That is not totally wrong... at least moving the dirty pages
> > between memcgs should be optional (move_charge_at_immigrate?).
> >
>
> My concern is
> - migration between memcg is already suppoted
> - at task move
> - at rmdir
>
> Then, if you leave DIRTY_PAGE accounting to original cgroup,
> the new cgroup (migration target)'s Dirty page accounting may
> goes to be negative, or incorrect value. Please check FILE_MAPPED
> implementation in __mem_cgroup_move_account()
>
> As
> if (page_mapped(page) && !PageAnon(page)) {
> /* Update mapped_file data for mem_cgroup */
> preempt_disable();
> __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> preempt_enable();
> }
> then, FILE_MAPPED never goes negative.
>

Absolutely! I am not sure how complex dirty memory migration will be,
but one way of working around it would be to disable migration of
charges when the feature is enabled (dirty* is set in the memory
cgroup). We might need additional logic to allow that to happen.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kirill A. Shutemov on
On Tue, Mar 2, 2010 at 3:47 PM, Balbir Singh <balbir(a)linux.vnet.ibm.com> wrote:
> * Andrea Righi <arighi(a)develer.com> [2010-03-01 22:23:40]:
>
>> Apply the cgroup dirty pages accounting and limiting infrastructure to
>> the opportune kernel functions.
>>
>> Signed-off-by: Andrea Righi <arighi(a)develer.com>
>> ---
>>  fs/fuse/file.c      |    5 +++
>>  fs/nfs/write.c      |    4 ++
>>  fs/nilfs2/segment.c |   10 +++++-
>>  mm/filemap.c        |    1 +
>>  mm/page-writeback.c |   84 ++++++++++++++++++++++++++++++++------------------
>>  mm/rmap.c           |    4 +-
>>  mm/truncate.c       |    2 +
>>  7 files changed, 76 insertions(+), 34 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index a9f5e13..dbbdd53 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/pagemap.h>
>>  #include <linux/slab.h>
>>  #include <linux/kernel.h>
>> +#include <linux/memcontrol.h>
>>  #include <linux/sched.h>
>>  #include <linux/module.h>
>>
>> @@ -1129,6 +1130,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>>
>>       list_del(&req->writepages_entry);
>>       dec_bdi_stat(bdi, BDI_WRITEBACK);
>> +     mem_cgroup_update_stat(req->pages[0],
>> +                     MEM_CGROUP_STAT_WRITEBACK_TEMP, -1);
>>       dec_zone_page_state(req->pages[0], NR_WRITEBACK_TEMP);
>>       bdi_writeout_inc(bdi);
>>       wake_up(&fi->page_waitq);
>> @@ -1240,6 +1243,8 @@ static int fuse_writepage_locked(struct page *page)
>>       req->inode = inode;
>>
>>       inc_bdi_stat(mapping->backing_dev_info, BDI_WRITEBACK);
>> +     mem_cgroup_update_stat(tmp_page,
>> +                     MEM_CGROUP_STAT_WRITEBACK_TEMP, 1);
>>       inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
>>       end_page_writeback(page);
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index b753242..7316f7a 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>
> Don't need memcontrol.h to be included here?

It's included in <linux/swap.h>

> Looks OK to me overall, but there might be objection using the
> mem_cgroup_* naming convention, but I don't mind it very much :)
>
> --
>        Three Cheers,
>        Balbir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo(a)kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont(a)kvack.org"> email(a)kvack.org </a>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vivek Goyal on
On Mon, Mar 01, 2010 at 11:18:31PM +0100, Andrea Righi wrote:
> On Mon, Mar 01, 2010 at 05:02:08PM -0500, Vivek Goyal wrote:
> > > @@ -686,10 +699,14 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > > */
> > > dirty_thresh += dirty_thresh / 10; /* wheeee... */
> > >
> > > - if (global_page_state(NR_UNSTABLE_NFS) +
> > > - global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > - break;
> > > - congestion_wait(BLK_RW_ASYNC, HZ/10);
> > > +
> > > + dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> > > + if (dirty < 0)
> > > + dirty = global_page_state(NR_UNSTABLE_NFS) +
> > > + global_page_state(NR_WRITEBACK);
> >
> > dirty is unsigned long. As mentioned last time, above will never be true?
> > In general these patches look ok to me. I will do some testing with these.
>
> Re-introduced the same bug. My bad. :(
>
> The value returned from mem_cgroup_page_stat() can be negative, i.e.
> when memory cgroup is disabled. We could simply use a long for dirty,
> the unit is in # of pages so s64 should be enough. Or cast dirty to long
> only for the check (see below).
>
> Thanks!
> -Andrea
>
> Signed-off-by: Andrea Righi <arighi(a)develer.com>
> ---
> mm/page-writeback.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index d83f41c..dbee976 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -701,7 +701,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
>
>
> dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> - if (dirty < 0)
> + if ((long)dirty < 0)

This will also be problematic as on 32bit systems, your uppper limit of
dirty memory will be 2G?

I guess, I will prefer one of the two.

- return the error code from function and pass a pointer to store stats
in as function argument.

- Or Peter's suggestion of checking mem_cgroup_has_dirty_limit() and if
per cgroup dirty control is enabled, then use per cgroup stats. In that
case you don't have to return negative values.

Only tricky part will be careful accouting so that none of the stats go
negative in corner cases of migration etc.

Thanks
Vivek

> dirty = global_page_state(NR_UNSTABLE_NFS) +
> global_page_state(NR_WRITEBACK);
> if (dirty <= dirty_thresh)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Balbir Singh on
* Peter Zijlstra <peterz(a)infradead.org> [2010-03-02 14:48:56]:

> This is ugly and broken.. I thought you'd agreed to something like:
>
> if (mem_cgroup_has_dirty_limit(cgroup))
> use mem_cgroup numbers
> else
> use global numbers
>
> That allows for a 0 dirty limit (which should work and basically makes
> all io synchronous).
>
> Also, I'd put each of those in a separate function, like:
>
> unsigned long reclaimable_pages(cgroup)
> {
> if (mem_cgroup_has_dirty_limit(cgroup))
> return mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
>
> return global_page_state(NR_FILE_DIRTY) + global_page_state(NR_NFS_UNSTABLE);
> }
>

I agree, I should have been more specific about the naming convention,
this is what I meant - along these lines as we do with
zone_nr_lru_pages(), etc.

> Which raises another question, you should probably rebase on top of
> Trond's patches, which removes BDI_RECLAIMABLE, suggesting you also
> loose MEMCG_NR_RECLAIM_PAGES in favour of the DIRTY+UNSTABLE split.
>

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Trond Myklebust on
On Tue, 2010-03-02 at 14:48 +0100, Peter Zijlstra wrote:
> unsigned long reclaimable_pages(cgroup)
> {
> if (mem_cgroup_has_dirty_limit(cgroup))
> return mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
>
> return global_page_state(NR_FILE_DIRTY) + global_page_state(NR_NFS_UNSTABLE);
> }
>
> Which raises another question, you should probably rebase on top of
> Trond's patches, which removes BDI_RECLAIMABLE, suggesting you also
> loose MEMCG_NR_RECLAIM_PAGES in favour of the DIRTY+UNSTABLE split.
>

I'm dropping those patches for now. The main writeback change wasn't too
favourably received by the linux-mm community so I've implemented an
alternative that only changes the NFS layer, and doesn't depend on the
DIRTY+UNSTABLE split.

Cheers
Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/