From: Daisuke Nishimura on
On Mon, 15 Mar 2010 00:26:39 +0100, Andrea Righi <arighi(a)develer.com> wrote:
> Document cgroup dirty memory interfaces and statistics.
>
> Signed-off-by: Andrea Righi <arighi(a)develer.com>
> ---
> Documentation/cgroups/memory.txt | 36 ++++++++++++++++++++++++++++++++++++
> 1 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 49f86f3..38ca499 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -310,6 +310,11 @@ cache - # of bytes of page cache memory.
> rss - # of bytes of anonymous and swap cache memory.
> pgpgin - # of pages paged in (equivalent to # of charging events).
> pgpgout - # of pages paged out (equivalent to # of uncharging events).
> +filedirty - # of pages that are waiting to get written back to the disk.
> +writeback - # of pages that are actively being written back to the disk.
> +writeback_tmp - # of pages used by FUSE for temporary writeback buffers.
> +nfs - # of NFS pages sent to the server, but not yet committed to
> + the actual storage.
> active_anon - # of bytes of anonymous and swap cache memory on active
> lru list.
> inactive_anon - # of bytes of anonymous memory and swap cache memory on
> @@ -345,6 +350,37 @@ Note:
> - a cgroup which uses hierarchy and it has child cgroup.
> - a cgroup which uses hierarchy and not the root of hierarchy.
>
> +5.4 dirty memory
> +
> + Control the maximum amount of dirty pages a cgroup can have at any given time.
> +
> + Limiting dirty memory is like fixing the max amount of dirty (hard to
> + reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers,
> + they will not be able to consume more than their designated share of dirty
> + pages and will be forced to perform write-out if they cross that limit.
> +
> + The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*.
> + It is possible to configure a limit to trigger both a direct writeback or a
> + background writeback performed by per-bdi flusher threads.
> +
> + Per-cgroup dirty limits can be set using the following files in the cgroupfs:
> +
> + - memory.dirty_ratio: contains, as a percentage of cgroup memory, the
> + amount of dirty memory at which a process which is generating disk writes
> + inside the cgroup will start itself writing out dirty data.
> +
> + - memory.dirty_bytes: the amount of dirty memory of the cgroup (expressed in
> + bytes) at which a process generating disk writes will start itself writing
> + out dirty data.
> +
> + - memory.dirty_background_ratio: contains, as a percentage of the cgroup
> + memory, the amount of dirty memory at which background writeback kernel
> + threads will start writing out dirty data.
> +
> + - memory.dirty_background_bytes: the amount of dirty memory of the cgroup (in
> + bytes) at which background writeback kernel threads will start writing out
> + dirty data.
> +
>
It would be better to note that what those files of root cgroup mean.
We cannot write any value to them, IOW, we cannot control dirty limit about root cgroup.
And they show the same value as the global one(strictly speaking, it's not true
because global values can change. We need a hook in mem_cgroup_dirty_read()?).

Thanks,
Daisuke Nishimura.

> 6. Hierarchy support
>
> --
> 1.6.3.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Greg Thelen on
On Mon, Mar 15, 2010 at 11:41 PM, Daisuke Nishimura
<nishimura(a)mxp.nes.nec.co.jp> wrote:
> On Mon, 15 Mar 2010 00:26:39 +0100, Andrea Righi <arighi(a)develer.com> wrote:
>> Document cgroup dirty memory interfaces and statistics.
>>
>> Signed-off-by: Andrea Righi <arighi(a)develer.com>
>> ---
>> �Documentation/cgroups/memory.txt | � 36 ++++++++++++++++++++++++++++++++++++
>> �1 files changed, 36 insertions(+), 0 deletions(-)
>>
>> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
>> index 49f86f3..38ca499 100644
>> --- a/Documentation/cgroups/memory.txt
>> +++ b/Documentation/cgroups/memory.txt
>> @@ -310,6 +310,11 @@ cache � � � � � �- # of bytes of page cache memory.
>> �rss � � � � �- # of bytes of anonymous and swap cache memory.
>> �pgpgin � � � � � � � - # of pages paged in (equivalent to # of charging events).
>> �pgpgout � � � � � � �- # of pages paged out (equivalent to # of uncharging events).
>> +filedirty � �- # of pages that are waiting to get written back to the disk.
>> +writeback � �- # of pages that are actively being written back to the disk.
>> +writeback_tmp � � � �- # of pages used by FUSE for temporary writeback buffers.
>> +nfs � � � � �- # of NFS pages sent to the server, but not yet committed to
>> + � � � � � � � the actual storage.

Should these new memory.stat counters (filedirty, etc) report byte
counts rather than page counts? I am thinking that byte counters
would make reporting more obvious depending on how heterogeneous page
sizes are used. Byte counters would also agree with /proc/meminfo.
Within the kernel we could still maintain page counts. The only
change would be to the reporting routine, mem_cgroup_get_local_stat(),
which would scale the page counts by PAGE_SIZE as it does for for
cache,rss,etc.

--
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Balbir Singh on
* Greg Thelen <gthelen(a)google.com> [2010-03-17 09:48:18]:

> On Mon, Mar 15, 2010 at 11:41 PM, Daisuke Nishimura
> <nishimura(a)mxp.nes.nec.co.jp> wrote:
> > On Mon, 15 Mar 2010 00:26:39 +0100, Andrea Righi <arighi(a)develer.com> wrote:
> >> Document cgroup dirty memory interfaces and statistics.
> >>
> >> Signed-off-by: Andrea Righi <arighi(a)develer.com>
> >> ---
> >> �Documentation/cgroups/memory.txt | � 36 ++++++++++++++++++++++++++++++++++++
> >> �1 files changed, 36 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> >> index 49f86f3..38ca499 100644
> >> --- a/Documentation/cgroups/memory.txt
> >> +++ b/Documentation/cgroups/memory.txt
> >> @@ -310,6 +310,11 @@ cache � � � � � �- # of bytes of page cache memory.
> >> �rss � � � � �- # of bytes of anonymous and swap cache memory.
> >> �pgpgin � � � � � � � - # of pages paged in (equivalent to # of charging events).
> >> �pgpgout � � � � � � �- # of pages paged out (equivalent to # of uncharging events).
> >> +filedirty � �- # of pages that are waiting to get written back to the disk.
> >> +writeback � �- # of pages that are actively being written back to the disk.
> >> +writeback_tmp � � � �- # of pages used by FUSE for temporary writeback buffers.
> >> +nfs � � � � �- # of NFS pages sent to the server, but not yet committed to
> >> + � � � � � � � the actual storage.
>
> Should these new memory.stat counters (filedirty, etc) report byte
> counts rather than page counts? I am thinking that byte counters
> would make reporting more obvious depending on how heterogeneous page
> sizes are used. Byte counters would also agree with /proc/meminfo.
> Within the kernel we could still maintain page counts. The only
> change would be to the reporting routine, mem_cgroup_get_local_stat(),
> which would scale the page counts by PAGE_SIZE as it does for for
> cache,rss,etc.
>

I agree, byte counts would be better than page counts. pgpin and
pgpout are special cases where the pages matter, the size does not due
to the nature of the operation.

--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Righi on
On Tue, Mar 16, 2010 at 04:41:21PM +0900, Daisuke Nishimura wrote:
> On Mon, 15 Mar 2010 00:26:39 +0100, Andrea Righi <arighi(a)develer.com> wrote:
> > Document cgroup dirty memory interfaces and statistics.
> >
> > Signed-off-by: Andrea Righi <arighi(a)develer.com>
> > ---
> > Documentation/cgroups/memory.txt | 36 ++++++++++++++++++++++++++++++++++++
> > 1 files changed, 36 insertions(+), 0 deletions(-)
> >
> > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> > index 49f86f3..38ca499 100644
> > --- a/Documentation/cgroups/memory.txt
> > +++ b/Documentation/cgroups/memory.txt
> > @@ -310,6 +310,11 @@ cache - # of bytes of page cache memory.
> > rss - # of bytes of anonymous and swap cache memory.
> > pgpgin - # of pages paged in (equivalent to # of charging events).
> > pgpgout - # of pages paged out (equivalent to # of uncharging events).
> > +filedirty - # of pages that are waiting to get written back to the disk.
> > +writeback - # of pages that are actively being written back to the disk.
> > +writeback_tmp - # of pages used by FUSE for temporary writeback buffers.
> > +nfs - # of NFS pages sent to the server, but not yet committed to
> > + the actual storage.
> > active_anon - # of bytes of anonymous and swap cache memory on active
> > lru list.
> > inactive_anon - # of bytes of anonymous memory and swap cache memory on
> > @@ -345,6 +350,37 @@ Note:
> > - a cgroup which uses hierarchy and it has child cgroup.
> > - a cgroup which uses hierarchy and not the root of hierarchy.
> >
> > +5.4 dirty memory
> > +
> > + Control the maximum amount of dirty pages a cgroup can have at any given time.
> > +
> > + Limiting dirty memory is like fixing the max amount of dirty (hard to
> > + reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers,
> > + they will not be able to consume more than their designated share of dirty
> > + pages and will be forced to perform write-out if they cross that limit.
> > +
> > + The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*.
> > + It is possible to configure a limit to trigger both a direct writeback or a
> > + background writeback performed by per-bdi flusher threads.
> > +
> > + Per-cgroup dirty limits can be set using the following files in the cgroupfs:
> > +
> > + - memory.dirty_ratio: contains, as a percentage of cgroup memory, the
> > + amount of dirty memory at which a process which is generating disk writes
> > + inside the cgroup will start itself writing out dirty data.
> > +
> > + - memory.dirty_bytes: the amount of dirty memory of the cgroup (expressed in
> > + bytes) at which a process generating disk writes will start itself writing
> > + out dirty data.
> > +
> > + - memory.dirty_background_ratio: contains, as a percentage of the cgroup
> > + memory, the amount of dirty memory at which background writeback kernel
> > + threads will start writing out dirty data.
> > +
> > + - memory.dirty_background_bytes: the amount of dirty memory of the cgroup (in
> > + bytes) at which background writeback kernel threads will start writing out
> > + dirty data.
> > +
> >
> It would be better to note that what those files of root cgroup mean.
> We cannot write any value to them, IOW, we cannot control dirty limit about root cgroup.

OK.

> And they show the same value as the global one(strictly speaking, it's not true
> because global values can change. We need a hook in mem_cgroup_dirty_read()?).

OK, we can just return system-wide value if mem_cgroup_is_root() in
mem_cgroup_dirty_read(). Will change this in the next version.

Thanks,
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/