From: Andrew Morton on
On Fri, 06 Aug 2010 00:10:58 +0800
Wu Fengguang <fengguang.wu(a)intel.com> wrote:

> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> Currently global_dirty_limits() applies a low bound of 5% for
> vm_dirty_ratio. This is not very user visible -- if the user sets
> vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> to 5% when used.
>
> Another problem is inconsistency: calc_period_shift() uses the plain
> vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> to < 5 by the user.

The changelog describes the old behaviour but doesn't describe the
proposed new behaviour.

> --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
>
> /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> +static int dirty_ratio_min = 5;
>
> /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> static int maxolduid = 65535;
> @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> .maxlen = sizeof(vm_dirty_ratio),
> .mode = 0644,
> .proc_handler = dirty_ratio_handler,
> - .extra1 = &zero,
> + .extra1 = &dirty_ratio_min,
> .extra2 = &one_hundred,
> },

I forget how the procfs core handles this. Presumably the write will
now fail with -EINVAL or something? So people's scripts will now
error out and their space shuttles will crash?

All of which illustrates why it's important to fully describe changes
in the changelog! So people can consider and discuss the end-user
implications of a change.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
On Fri, Aug 06, 2010 at 07:34:01AM +0800, Andrew Morton wrote:
> On Fri, 06 Aug 2010 00:10:58 +0800
> Wu Fengguang <fengguang.wu(a)intel.com> wrote:
>
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > Currently global_dirty_limits() applies a low bound of 5% for
> > vm_dirty_ratio. This is not very user visible -- if the user sets
> > vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> > to 5% when used.
> >
> > Another problem is inconsistency: calc_period_shift() uses the plain
> > vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> > to < 5 by the user.
>
> The changelog describes the old behaviour but doesn't describe the
> proposed new behaviour.

Yeah, fixed below.

> > --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> > +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> > @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
> >
> > /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> > static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> > +static int dirty_ratio_min = 5;
> >
> > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> > static int maxolduid = 65535;
> > @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> > .maxlen = sizeof(vm_dirty_ratio),
> > .mode = 0644,
> > .proc_handler = dirty_ratio_handler,
> > - .extra1 = &zero,
> > + .extra1 = &dirty_ratio_min,
> > .extra2 = &one_hundred,
> > },
>
> I forget how the procfs core handles this. Presumably the write will
> now fail with -EINVAL or something?

Right.
# echo 111 > /proc/sys/vm/dirty_ratio
echo: write error: invalid argument

> So people's scripts will now error out and their space shuttles will
> crash?

Looks like a serious problem. I'm now much more reserved on pushing
this patch :)

> All of which illustrates why it's important to fully describe changes
> in the changelog! So people can consider and discuss the end-user
> implications of a change.

Good point. Here is the patch with updated changelog.

Thanks,
Fengguang
---
Subject: writeback: explicit low bound for vm.dirty_ratio
From: Wu Fengguang <fengguang.wu(a)intel.com>
Date: Thu Jul 15 10:28:57 CST 2010

Force a user visible low bound of 5% for the vm.dirty_ratio interface.

This is an interface change. When doing

echo N > /proc/sys/vm/dirty_ratio

where N < 5, the old behavior is pretend to accept the value, while
the new behavior is to reject it explicitly with -EINVAL. This will
possibly break user space if they checks the return value.

Currently global_dirty_limits() applies a low bound of 5% for
vm_dirty_ratio. This is not very user visible -- if the user sets
vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
to 5% when used.

Another problem is inconsistency: calc_period_shift() uses the plain
vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
to < 5 by the user.

CC: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
kernel/sysctl.c | 3 ++-
mm/page-writeback.c | 10 ++--------
2 files changed, 4 insertions(+), 9 deletions(-)

--- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
+++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
@@ -126,6 +126,7 @@ static int ten_thousand = 10000;

/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+static int dirty_ratio_min = 5;

/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(vm_dirty_ratio),
.mode = 0644,
.proc_handler = dirty_ratio_handler,
- .extra1 = &zero,
+ .extra1 = &dirty_ratio_min,
.extra2 = &one_hundred,
},
{
--- linux-next.orig/mm/page-writeback.c 2010-08-05 22:48:42.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-08-05 22:48:47.000000000 +0800
@@ -415,14 +415,8 @@ void global_dirty_limits(unsigned long *

if (vm_dirty_bytes)
dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
- else {
- int dirty_ratio;
-
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
- dirty = (dirty_ratio * available_memory) / 100;
- }
+ else
+ dirty = (vm_dirty_ratio * available_memory) / 100;

if (dirty_background_bytes)
background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> Subject: writeback: explicit low bound for vm.dirty_ratio
> From: Wu Fengguang <fengguang.wu(a)intel.com>
> Date: Thu Jul 15 10:28:57 CST 2010
>
> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> This is an interface change. When doing
>
> echo N > /proc/sys/vm/dirty_ratio
>
> where N < 5, the old behavior is pretend to accept the value, while
> the new behavior is to reject it explicitly with -EINVAL. This will
> possibly break user space if they checks the return value.

Umm.. I dislike this change. Is there any good reason to refuse explicit
admin's will? Why 1-4% is so bad? Internal clipping can be changed later
but explicit error behavior is hard to change later.

personally I prefer to
- accept all value, or
- clipping value in dirty_ratio_handler

Both don't have explicit ABI change.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on
On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:

> > Subject: writeback: explicit low bound for vm.dirty_ratio
> > From: Wu Fengguang <fengguang.wu(a)intel.com>
> > Date: Thu Jul 15 10:28:57 CST 2010
> >
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > This is an interface change. When doing
> >
> > echo N > /proc/sys/vm/dirty_ratio
> >
> > where N < 5, the old behavior is pretend to accept the value, while
> > the new behavior is to reject it explicitly with -EINVAL. This will
> > possibly break user space if they checks the return value.
>
> Umm.. I dislike this change. Is there any good reason to refuse explicit
> admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> but explicit error behavior is hard to change later.

As a data-point, I had a situation a while back where I needed a value below
1 to get desired behaviour. The system had lots of RAM and fairly slow
write-back (over NFS) so a 'sync' could take minutes.

So I would much prefer allowing not only 1-4, but also fraction values!!!

I can see no justification at all for setting a lower bound of 5. Even zero
can be useful - for testing purposes mostly.

NeilBrown

> personally I prefer to
> - accept all value, or
> - clipping value in dirty_ratio_handler
>
> Both don't have explicit ABI change.
>
> Thanks.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Tue 10-08-10 13:57:12, Neil Brown wrote:
> On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro(a)jp.fujitsu.com> wrote:
>
> > > Subject: writeback: explicit low bound for vm.dirty_ratio
> > > From: Wu Fengguang <fengguang.wu(a)intel.com>
> > > Date: Thu Jul 15 10:28:57 CST 2010
> > >
> > > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> > >
> > > This is an interface change. When doing
> > >
> > > echo N > /proc/sys/vm/dirty_ratio
> > >
> > > where N < 5, the old behavior is pretend to accept the value, while
> > > the new behavior is to reject it explicitly with -EINVAL. This will
> > > possibly break user space if they checks the return value.
> >
> > Umm.. I dislike this change. Is there any good reason to refuse explicit
> > admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> > but explicit error behavior is hard to change later.
>
> As a data-point, I had a situation a while back where I needed a value below
> 1 to get desired behaviour. The system had lots of RAM and fairly slow
> write-back (over NFS) so a 'sync' could take minutes.
>
> So I would much prefer allowing not only 1-4, but also fraction values!!!
>
> I can see no justification at all for setting a lower bound of 5. Even zero
> can be useful - for testing purposes mostly.
If you run on a recent kernel, /proc/sys/vm/dirty_background_bytes and
dirty_bytes is what was introduced exactly for these purposes. Not that I
would think that magic clipping at 5% is a good thing...

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/