memcg: dirty pages instrumentation [Kernel]

Prev: [announce] gujin GPL bootloader version 2.8
Next: [Bug #14792] Misdetection of the TV output

From: Andrea Righi on 3 Mar 2010 06:50

On Wed, Mar 03, 2010 at 08:21:07AM +0900, Daisuke Nishimura wrote:
> On Tue, 2 Mar 2010 23:18:23 +0100, Andrea Righi <arighi(a)develer.com> wrote:
> > On Tue, Mar 02, 2010 at 07:20:26PM +0530, Balbir Singh wrote:
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> [2010-03-02 17:23:16]:
> > >
> > > > On Tue, 2 Mar 2010 09:01:58 +0100
> > > > Andrea Righi <arighi(a)develer.com> wrote:
> > > >
> > > > > On Tue, Mar 02, 2010 at 09:23:09AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > > > On Mon, 1 Mar 2010 22:23:40 +0100
> > > > > > Andrea Righi <arighi(a)develer.com> wrote:
> > > > > >
> > > > > > > Apply the cgroup dirty pages accounting and limiting infrastructure to
> > > > > > > the opportune kernel functions.
> > > > > > >
> > > > > > > Signed-off-by: Andrea Righi <arighi(a)develer.com>
> > > > > >
> > > > > > Seems nice.
> > > > > >
> > > > > > Hmm. the last problem is moving account between memcg.
> > > > > >
> > > > > > Right ?
> > > > >
> > > > > Correct. This was actually the last item of the TODO list. Anyway, I'm
> > > > > still considering if it's correct to move dirty pages when a task is
> > > > > migrated from a cgroup to another. Currently, dirty pages just remain in
> > > > > the original cgroup and are flushed depending on the original cgroup
> > > > > settings. That is not totally wrong... at least moving the dirty pages
> > > > > between memcgs should be optional (move_charge_at_immigrate?).
> > > > >
> > > >
> > > > My concern is
> > > > - migration between memcg is already suppoted
> > > > - at task move
> > > > - at rmdir
> > > >
> > > > Then, if you leave DIRTY_PAGE accounting to original cgroup,
> > > > the new cgroup (migration target)'s Dirty page accounting may
> > > > goes to be negative, or incorrect value. Please check FILE_MAPPED
> > > > implementation in __mem_cgroup_move_account()
> > > >
> > > > As
> > > > if (page_mapped(page) && !PageAnon(page)) {
> > > > /* Update mapped_file data for mem_cgroup */
> > > > preempt_disable();
> > > > __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > > > __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > > > preempt_enable();
> > > > }
> > > > then, FILE_MAPPED never goes negative.
> > > >
> > >
> > > Absolutely! I am not sure how complex dirty memory migration will be,
> > > but one way of working around it would be to disable migration of
> > > charges when the feature is enabled (dirty* is set in the memory
> > > cgroup). We might need additional logic to allow that to happen.
> >
> > I've started to look at dirty memory migration. First attempt is to add
> > DIRTY, WRITEBACK, etc. to page_cgroup flags and handle them in
> > __mem_cgroup_move_account(). Probably I'll have something ready for the
> > next version of the patch. I still need to figure if this can work as
> > expected...
> >
> I agree it's a right direction(in fact, I have been planning to post a patch
> in that direction), so I leave it to you.
> Can you add PCG_FILE_MAPPED flag too ? I think this flag can be handled in the
> same way as other flags you're trying to add, and we can change
> "if (page_mapped(page) && !PageAnon(page))" to "if (PageCgroupFileMapped(pc)"
> in __mem_cgroup_move_account(). It would be cleaner than current code, IMHO.

OK, sounds good to me. I'll introduce PCG_FILE_MAPPED in the next
version.

Thanks,
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Righi on 3 Mar 2010 06:50

On Tue, Mar 02, 2010 at 06:59:32PM -0500, Vivek Goyal wrote:
> On Tue, Mar 02, 2010 at 11:22:48PM +0100, Andrea Righi wrote:
> > On Tue, Mar 02, 2010 at 10:05:29AM -0500, Vivek Goyal wrote:
> > > On Mon, Mar 01, 2010 at 11:18:31PM +0100, Andrea Righi wrote:
> > > > On Mon, Mar 01, 2010 at 05:02:08PM -0500, Vivek Goyal wrote:
> > > > > > @@ -686,10 +699,14 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > > > > > */
> > > > > > dirty_thresh += dirty_thresh / 10; /* wheeee... */
> > > > > >
> > > > > > - if (global_page_state(NR_UNSTABLE_NFS) +
> > > > > > - global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > > > > - break;
> > > > > > - congestion_wait(BLK_RW_ASYNC, HZ/10);
> > > > > > +
> > > > > > + dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> > > > > > + if (dirty < 0)
> > > > > > + dirty = global_page_state(NR_UNSTABLE_NFS) +
> > > > > > + global_page_state(NR_WRITEBACK);
> > > > >
> > > > > dirty is unsigned long. As mentioned last time, above will never be true?
> > > > > In general these patches look ok to me. I will do some testing with these.
> > > >
> > > > Re-introduced the same bug. My bad. :(
> > > >
> > > > The value returned from mem_cgroup_page_stat() can be negative, i.e.
> > > > when memory cgroup is disabled. We could simply use a long for dirty,
> > > > the unit is in # of pages so s64 should be enough. Or cast dirty to long
> > > > only for the check (see below).
> > > >
> > > > Thanks!
> > > > -Andrea
> > > >
> > > > Signed-off-by: Andrea Righi <arighi(a)develer.com>
> > > > ---
> > > > mm/page-writeback.c | 2 +-
> > > > 1 files changed, 1 insertions(+), 1 deletions(-)
> > > >
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index d83f41c..dbee976 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -701,7 +701,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > > >
> > > >
> > > > dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> > > > - if (dirty < 0)
> > > > + if ((long)dirty < 0)
> > >
> > > This will also be problematic as on 32bit systems, your uppper limit of
> > > dirty memory will be 2G?
> > >
> > > I guess, I will prefer one of the two.
> > >
> > > - return the error code from function and pass a pointer to store stats
> > > in as function argument.
> > >
> > > - Or Peter's suggestion of checking mem_cgroup_has_dirty_limit() and if
> > > per cgroup dirty control is enabled, then use per cgroup stats. In that
> > > case you don't have to return negative values.
> > >
> > > Only tricky part will be careful accouting so that none of the stats go
> > > negative in corner cases of migration etc.
> >
> > What do you think about Peter's suggestion + the locking stuff? (see the
> > previous email). Otherwise, I'll choose the other solution, passing a
> > pointer and always return the error code is not bad.
> >
>
> Ok, so you are worried about that by the we finish mem_cgroup_has_dirty_limit()
> call, task might change cgroup and later we might call
> mem_cgroup_get_page_stat() on a different cgroup altogether which might or
> might not have dirty limits specified?

Correct.

>
> But in what cases you don't want to use memory cgroup specified limit? I
> thought cgroup disabled what the only case where we need to use global
> limits. Otherwise a memory cgroup will have either dirty_bytes specified
> or by default inherit global dirty_ratio which is a valid number. If
> that's the case then you don't have to take rcu_lock() outside
> get_page_stat()?
>
> IOW, apart from cgroup being disabled, what are the other cases where you
> expect to not use cgroup's page stat and use global stats?

At boot, when mem_cgroup_from_task() may return NULL. But this is not
related to the RCU acquisition.

Anyway, probably the RCU protection is not so critical for this
particular case, and we can simply get rid of it. In this way we can
easily implement the interface proposed by Peter.

-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Righi on 3 Mar 2010 07:00

On Wed, Mar 03, 2010 at 12:47:03PM +0100, Andrea Righi wrote:
> On Tue, Mar 02, 2010 at 06:59:32PM -0500, Vivek Goyal wrote:
> > On Tue, Mar 02, 2010 at 11:22:48PM +0100, Andrea Righi wrote:
> > > On Tue, Mar 02, 2010 at 10:05:29AM -0500, Vivek Goyal wrote:
> > > > On Mon, Mar 01, 2010 at 11:18:31PM +0100, Andrea Righi wrote:
> > > > > On Mon, Mar 01, 2010 at 05:02:08PM -0500, Vivek Goyal wrote:
> > > > > > > @@ -686,10 +699,14 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > > > > > > */
> > > > > > > dirty_thresh += dirty_thresh / 10; /* wheeee... */
> > > > > > >
> > > > > > > - if (global_page_state(NR_UNSTABLE_NFS) +
> > > > > > > - global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > > > > > - break;
> > > > > > > - congestion_wait(BLK_RW_ASYNC, HZ/10);
> > > > > > > +
> > > > > > > + dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> > > > > > > + if (dirty < 0)
> > > > > > > + dirty = global_page_state(NR_UNSTABLE_NFS) +
> > > > > > > + global_page_state(NR_WRITEBACK);
> > > > > >
> > > > > > dirty is unsigned long. As mentioned last time, above will never be true?
> > > > > > In general these patches look ok to me. I will do some testing with these.
> > > > >
> > > > > Re-introduced the same bug. My bad. :(
> > > > >
> > > > > The value returned from mem_cgroup_page_stat() can be negative, i.e.
> > > > > when memory cgroup is disabled. We could simply use a long for dirty,
> > > > > the unit is in # of pages so s64 should be enough. Or cast dirty to long
> > > > > only for the check (see below).
> > > > >
> > > > > Thanks!
> > > > > -Andrea
> > > > >
> > > > > Signed-off-by: Andrea Righi <arighi(a)develer.com>
> > > > > ---
> > > > > mm/page-writeback.c | 2 +-
> > > > > 1 files changed, 1 insertions(+), 1 deletions(-)
> > > > >
> > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > > index d83f41c..dbee976 100644
> > > > > --- a/mm/page-writeback.c
> > > > > +++ b/mm/page-writeback.c
> > > > > @@ -701,7 +701,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > > > >
> > > > >
> > > > > dirty = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
> > > > > - if (dirty < 0)
> > > > > + if ((long)dirty < 0)
> > > >
> > > > This will also be problematic as on 32bit systems, your uppper limit of
> > > > dirty memory will be 2G?
> > > >
> > > > I guess, I will prefer one of the two.
> > > >
> > > > - return the error code from function and pass a pointer to store stats
> > > > in as function argument.
> > > >
> > > > - Or Peter's suggestion of checking mem_cgroup_has_dirty_limit() and if
> > > > per cgroup dirty control is enabled, then use per cgroup stats. In that
> > > > case you don't have to return negative values.
> > > >
> > > > Only tricky part will be careful accouting so that none of the stats go
> > > > negative in corner cases of migration etc.
> > >
> > > What do you think about Peter's suggestion + the locking stuff? (see the
> > > previous email). Otherwise, I'll choose the other solution, passing a
> > > pointer and always return the error code is not bad.
> > >
> >
> > Ok, so you are worried about that by the we finish mem_cgroup_has_dirty_limit()
> > call, task might change cgroup and later we might call
> > mem_cgroup_get_page_stat() on a different cgroup altogether which might or
> > might not have dirty limits specified?
>
> Correct.
>
> >
> > But in what cases you don't want to use memory cgroup specified limit? I
> > thought cgroup disabled what the only case where we need to use global
> > limits. Otherwise a memory cgroup will have either dirty_bytes specified
> > or by default inherit global dirty_ratio which is a valid number. If
> > that's the case then you don't have to take rcu_lock() outside
> > get_page_stat()?
> >
> > IOW, apart from cgroup being disabled, what are the other cases where you
> > expect to not use cgroup's page stat and use global stats?
>
> At boot, when mem_cgroup_from_task() may return NULL. But this is not
> related to the RCU acquisition.

Nevermind. You're right. In any case even if a task is migrated to a
different cgroup it will always have mem_cgroup_has_dirty_limit() ==
true.

So RCU protection is not needed outside these functions.

OK, I'll go with the Peter's suggestion.

Thanks!
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Righi on 3 Mar 2010 07:10

On Wed, Mar 03, 2010 at 11:07:35AM +0100, Peter Zijlstra wrote:
> On Tue, 2010-03-02 at 23:14 +0100, Andrea Righi wrote:
> >
> > I agree mem_cgroup_has_dirty_limit() is nicer. But we must do that under
> > RCU, so something like:
> >
> > rcu_read_lock();
> > if (mem_cgroup_has_dirty_limit())
> > mem_cgroup_get_page_stat()
> > else
> > global_page_state()
> > rcu_read_unlock();
> >
> > That is bad when mem_cgroup_has_dirty_limit() always returns false
> > (e.g., when memory cgroups are disabled). So I fallback to the old
> > interface.
>
> Why is it that mem_cgroup_has_dirty_limit() needs RCU when
> mem_cgroup_get_page_stat() doesn't? That is, simply make
> mem_cgroup_has_dirty_limit() not require RCU in the same way
> *_get_page_stat() doesn't either.

OK, I agree we can get rid of RCU protection here (see my previous
email).

BTW the point was that after mem_cgroup_has_dirty_limit() the task might
be moved to another cgroup, but also in this case mem_cgroup_has_dirty_limit()
will be always true, so mem_cgroup_get_page_stat() is always coherent.

>
> > What do you think about:
> >
> > mem_cgroup_lock();
> > if (mem_cgroup_has_dirty_limit())
> > mem_cgroup_get_page_stat()
> > else
> > global_page_state()
> > mem_cgroup_unlock();
> >
> > Where mem_cgroup_read_lock/unlock() simply expand to nothing when
> > memory cgroups are disabled.
>
> I think you're engineering the wrong way around.
>
> > >
> > > That allows for a 0 dirty limit (which should work and basically makes
> > > all io synchronous).
> >
> > IMHO it is better to reserve 0 for the special value "disabled" like the
> > global settings. A synchronous IO can be also achieved using a dirty
> > limit of 1.
>
> Why?! 0 clearly states no writeback cache, IOW sync writes, a 1
> byte/page writeback cache effectively reduces to the same thing, but its
> not the same thing conceptually. If you want to put the size and enable
> into a single variable pick -1 for disable or so.

I might agree, and actually I prefer this solution.. but in this way we
would use a different interface respect to the equivalent vm_dirty_ratio
/ vm_dirty_bytes global settings (as well as dirty_background_ratio /
dirty_background_bytes).

IMHO it's better to use the same interface to avoid user
misunderstandings.

Thanks,
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Righi on 3 Mar 2010 17:10

On Wed, Mar 03, 2010 at 05:21:32PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 3 Mar 2010 15:15:49 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
> > Agreed.
> > Let's try how we can write a code in clean way. (we have time ;)
> > For now, to me, IRQ disabling while lock_page_cgroup() seems to be a little
> > over killing. What I really want is lockless code...but it seems impossible
> > under current implementation.
> >
> > I wonder the fact "the page is never unchareged under us" can give us some chances
> > ...Hmm.
> >
>
> How about this ? Basically, I don't like duplicating information...so,
> # of new pcg_flags may be able to be reduced.
>
> I'm glad this can be a hint for Andrea-san.
>
> ==
> ---
> include/linux/page_cgroup.h | 44 ++++++++++++++++++++-
> mm/memcontrol.c | 91 +++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 132 insertions(+), 3 deletions(-)
>
> Index: mmotm-2.6.33-Mar2/include/linux/page_cgroup.h
> ===================================================================
> --- mmotm-2.6.33-Mar2.orig/include/linux/page_cgroup.h
> +++ mmotm-2.6.33-Mar2/include/linux/page_cgroup.h
> @@ -39,6 +39,11 @@ enum {
> PCG_CACHE, /* charged as cache */
> PCG_USED, /* this object is in use. */
> PCG_ACCT_LRU, /* page has been accounted for */
> + PCG_MIGRATE_LOCK, /* used for mutual execution of account migration */
> + PCG_ACCT_DIRTY,
> + PCG_ACCT_WB,
> + PCG_ACCT_WB_TEMP,
> + PCG_ACCT_UNSTABLE,
> };
>
> #define TESTPCGFLAG(uname, lname) \
> @@ -73,6 +78,23 @@ CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> TESTPCGFLAG(AcctLRU, ACCT_LRU)
> TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
>
> +SETPCGFLAG(AcctDirty, ACCT_DIRTY);
> +CLEARPCGFLAG(AcctDirty, ACCT_DIRTY);
> +TESTPCGFLAG(AcctDirty, ACCT_DIRTY);
> +
> +SETPCGFLAG(AcctWB, ACCT_WB);
> +CLEARPCGFLAG(AcctWB, ACCT_WB);
> +TESTPCGFLAG(AcctWB, ACCT_WB);
> +
> +SETPCGFLAG(AcctWBTemp, ACCT_WB_TEMP);
> +CLEARPCGFLAG(AcctWBTemp, ACCT_WB_TEMP);
> +TESTPCGFLAG(AcctWBTemp, ACCT_WB_TEMP);
> +
> +SETPCGFLAG(AcctUnstableNFS, ACCT_UNSTABLE);
> +CLEARPCGFLAG(AcctUnstableNFS, ACCT_UNSTABLE);
> +TESTPCGFLAG(AcctUnstableNFS, ACCT_UNSTABLE);
> +
> +
> static inline int page_cgroup_nid(struct page_cgroup *pc)
> {
> return page_to_nid(pc->page);
> @@ -82,7 +104,9 @@ static inline enum zone_type page_cgroup
> {
> return page_zonenum(pc->page);
> }
> -
> +/*
> + * lock_page_cgroup() should not be held under mapping->tree_lock
> + */
> static inline void lock_page_cgroup(struct page_cgroup *pc)
> {
> bit_spin_lock(PCG_LOCK, &pc->flags);
> @@ -93,6 +117,24 @@ static inline void unlock_page_cgroup(st
> bit_spin_unlock(PCG_LOCK, &pc->flags);
> }
>
> +/*
> + * Lock order is
> + * lock_page_cgroup()
> + * lock_page_cgroup_migrate()
> + * This lock is not be lock for charge/uncharge but for account moving.
> + * i.e. overwrite pc->mem_cgroup. The lock owner should guarantee by itself
> + * the page is uncharged while we hold this.
> + */
> +static inline void lock_page_cgroup_migrate(struct page_cgroup *pc)
> +{
> + bit_spin_lock(PCG_MIGRATE_LOCK, &pc->flags);
> +}
> +
> +static inline void unlock_page_cgroup_migrate(struct page_cgroup *pc)
> +{
> + bit_spin_unlock(PCG_MIGRATE_LOCK, &pc->flags);
> +}
> +
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct page_cgroup;
>
> Index: mmotm-2.6.33-Mar2/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Mar2.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Mar2/mm/memcontrol.c
> @@ -87,6 +87,10 @@ enum mem_cgroup_stat_index {
> MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */
> MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> MEM_CGROUP_EVENTS, /* incremented at every pagein/pageout */
> + MEM_CGROUP_STAT_DIRTY,
> + MEM_CGROUP_STAT_WBACK,
> + MEM_CGROUP_STAT_WBACK_TEMP,
> + MEM_CGROUP_STAT_UNSTABLE_NFS,
>
> MEM_CGROUP_STAT_NSTATS,
> };
> @@ -1360,6 +1364,86 @@ done:
> }
>
> /*
> + * Update file cache's status for memcg. Before calling this,
> + * mapping->tree_lock should be held and preemption is disabled.
> + * Then, it's guarnteed that the page is not uncharged while we
> + * access page_cgroup. We can make use of that.
> + */
> +void mem_cgroup_update_stat_locked(struct page *page, int idx, bool set)
> +{
> + struct page_cgroup *pc;
> + struct mem_cgroup *mem;
> +
> + pc = lookup_page_cgroup(page);
> + /* Not accounted ? */
> + if (!PageCgroupUsed(pc))
> + return;
> + lock_page_cgroup_migrate(pc);
> + /*
> + * It's guarnteed that this page is never uncharged.
> + * The only racy problem is moving account among memcgs.
> + */
> + switch (idx) {
> + case MEM_CGROUP_STAT_DIRTY:
> + if (set)
> + SetPageCgroupAcctDirty(pc);
> + else
> + ClearPageCgroupAcctDirty(pc);
> + break;
> + case MEM_CGROUP_STAT_WBACK:
> + if (set)
> + SetPageCgroupAcctWB(pc);
> + else
> + ClearPageCgroupAcctWB(pc);
> + break;
> + case MEM_CGROUP_STAT_WBACK_TEMP:
> + if (set)
> + SetPageCgroupAcctWBTemp(pc);
> + else
> + ClearPageCgroupAcctWBTemp(pc);
> + break;
> + case MEM_CGROUP_STAT_UNSTABLE_NFS:
> + if (set)
> + SetPageCgroupAcctUnstableNFS(pc);
> + else
> + ClearPageCgroupAcctUnstableNFS(pc);
> + break;
> + default:
> + BUG();
> + break;
> + }
> + mem = pc->mem_cgroup;
> + if (set)
> + __this_cpu_inc(mem->stat->count[idx]);
> + else
> + __this_cpu_dec(mem->stat->count[idx]);
> + unlock_page_cgroup_migrate(pc);
> +}
> +
> +static void move_acct_information(struct mem_cgroup *from,
> + struct mem_cgroup *to,
> + struct page_cgroup *pc)
> +{
> + /* preemption is disabled, migration_lock is held. */
> + if (PageCgroupAcctDirty(pc)) {
> + __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_DIRTY]);
> + __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_DIRTY]);
> + }
> + if (PageCgroupAcctWB(pc)) {
> + __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_WBACK]);
> + __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_WBACK]);
> + }
> + if (PageCgroupAcctWBTemp(pc)) {
> + __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_WBACK_TEMP]);
> + __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_WBACK_TEMP]);
> + }
> + if (PageCgroupAcctUnstableNFS(pc)) {
> + __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_UNSTABLE_NFS]);
> + __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_UNSTABLE_NFS]);
> + }
> +}
> +
> +/*
> * size of first charge trial. "32" comes from vmscan.c's magic value.
> * TODO: maybe necessary to use big numbers in big irons.
> */
> @@ -1794,15 +1878,16 @@ static void __mem_cgroup_move_account(st
> VM_BUG_ON(!PageCgroupUsed(pc));
> VM_BUG_ON(pc->mem_cgroup != from);
>
> + preempt_disable();
> + lock_page_cgroup_migrate(pc);
> page = pc->page;
> if (page_mapped(page) && !PageAnon(page)) {
> /* Update mapped_file data for mem_cgroup */
> - preempt_disable();
> __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> __this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> - preempt_enable();
> }
> mem_cgroup_charge_statistics(from, pc, false);
> + move_acct_information(from, to, pc);

Kame-san, a question. According to is_target_pte_for_mc() it seems we
don't move file pages across cgroups for now. If !PageAnon(page) we just
return 0 and the page won't be selected for migration in
mem_cgroup_move_charge_pte_range().

So, if I've understood well the code is correct in perspective, but
right now it's unnecessary. File pages are not moved on task migration
across cgroups and, at the moment, there's no way for file page
accounted statistics to go negative.

Or am I missing something?

Thanks,
-Andrea

> if (uncharge)
> /* This is not "cancel", but cancel_charge does all we need. */
> mem_cgroup_cancel_charge(from);
> @@ -1810,6 +1895,8 @@ static void __mem_cgroup_move_account(st
> /* caller should have done css_get */
> pc->mem_cgroup = to;
> mem_cgroup_charge_statistics(to, pc, true);
> + unlock_page_cgroup_migrate(pc);
> + preempt_enable();
> /*
> * We charges against "to" which may not have any tasks. Then, "to"
> * can be under rmdir(). But in current implementation, caller of
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: [announce] gujin GPL bootloader version 2.8
Next: [Bug #14792] Misdetection of the TV output