From: Wu Fengguang on
> Christian, with this patch and more patches to scale down readahead
> size on small memory/device size, I guess it's no longer necessary to
> introduce a CONFIG_READAHEAD_SIZE?

This is the memory size based readahead limit :)

Thanks,
Fengguang
---
readahead: limit readahead size for small memory systems

When lifting the default readahead size from 128KB to 512KB,
make sure it won't add memory pressure to small memory systems.

For read-ahead, the memory pressure is mainly readahead buffers consumed
by too many concurrent streams. The context readahead can adapt
readahead size to thrashing threshold well. So in principle we don't
need to adapt the default _max_ read-ahead size to memory pressure.

For read-around, the memory pressure is mainly read-around misses on
executables/libraries. Which could be reduced by scaling down
read-around size on fast "reclaim passes".

This patch presents a straightforward solution: to limit default
readahead size proportional to available system memory, ie.
512MB mem => 512KB readahead size
128MB mem => 128KB readahead size
32MB mem => 32KB readahead size (minimal)

Strictly speaking, only read-around size has to be limited. However we
don't bother to seperate read-around size from read-ahead size for now.

CC: Matt Mackall <mpm(a)selenic.com>
Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
mm/readahead.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

--- linux.orig/mm/readahead.c 2010-02-21 22:42:15.000000000 +0800
+++ linux/mm/readahead.c 2010-02-21 23:43:14.000000000 +0800
@@ -19,6 +19,9 @@
#include <linux/pagevec.h>
#include <linux/pagemap.h>

+#define MIN_READAHEAD_PAGES DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
+
+static int __init user_defined_readahead_size;
static int __init config_readahead_size(char *str)
{
unsigned long bytes;
@@ -36,11 +39,33 @@ static int __init config_readahead_size(
bytes = 128 << 20;
}

+ user_defined_readahead_size = 1;
default_backing_dev_info.ra_pages = bytes / PAGE_CACHE_SIZE;
return 0;
}
early_param("readahead", config_readahead_size);

+static int __init readahead_init(void)
+{
+ /*
+ * Scale down default readahead size for small memory systems.
+ * For example, a 64MB box will do 64KB read-ahead/read-around
+ * instead of the default 512KB.
+ *
+ * Note that the default readahead size will also be scaled down
+ * for small devices in add_disk().
+ */
+ if (!user_defined_readahead_size) {
+ unsigned long max = roundup_pow_of_two(totalram_pages / 1024);
+ if (default_backing_dev_info.ra_pages > max)
+ default_backing_dev_info.ra_pages = max;
+ if (default_backing_dev_info.ra_pages < MIN_READAHEAD_PAGES)
+ default_backing_dev_info.ra_pages = MIN_READAHEAD_PAGES;
+ }
+ return 0;
+}
+fs_initcall(readahead_init);
+
/*
* Initialise a struct file's readahead state. Assumes that the caller has
* memset *ra to zero.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wu Fengguang on
> +unsigned long max_readahead_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE;
> +
> +static int __init readahead(char *str)
> +{
> + unsigned long bytes;
> +
> + if (!str)
> + return -EINVAL;
> + bytes = memparse(str, &str);
> + if (*str != '\0')
> + return -EINVAL;
> +
> + if (bytes) {
> + if (bytes < PAGE_CACHE_SIZE) /* missed 'k'/'m' suffixes? */
> + return -EINVAL;
> + if (bytes > 128 << 20) /* limit to 128MB */
> + bytes = 128 << 20;
> + }
> +
> + max_readahead_pages = bytes / PAGE_CACHE_SIZE;
> + default_backing_dev_info.ra_pages = max_readahead_pages;
> + return 0;
> +}
> +
> +early_param("readahead", readahead);

This further optimizes away max_readahead_pages :)

---
make default readahead size a kernel parameter

From: Nikanth Karthikesan <knikanth(a)suse.de>

Add new kernel parameter "readahead", which allows user to override
the static VM_MAX_READAHEAD=512kb.

CC: Ankit Jain <radical(a)gmail.com>
CC: Dave Chinner <david(a)fromorbit.com>
CC: Christian Ehrhardt <ehrhardt(a)linux.vnet.ibm.com>
Signed-off-by: Nikanth Karthikesan <knikanth(a)suse.de>
Signed-off-by: Wu Fengguang <fengguang.wu(a)intel.com>
---
Documentation/kernel-parameters.txt | 4 ++++
block/blk-core.c | 3 +--
fs/fuse/inode.c | 2 +-
mm/readahead.c | 22 ++++++++++++++++++++++
4 files changed, 28 insertions(+), 3 deletions(-)

--- linux.orig/Documentation/kernel-parameters.txt 2010-02-21 22:41:29.000000000 +0800
+++ linux/Documentation/kernel-parameters.txt 2010-02-21 22:41:30.000000000 +0800
@@ -2174,6 +2174,10 @@ and is between 256 and 4096 characters.
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.

+ readahead=nn[KM]
+ Default max readahead size for block devices.
+ Range: 0; 4k - 128m
+
reboot= [BUGS=X86-32,BUGS=ARM,BUGS=IA-64] Rebooting mode
Format: <reboot_mode>[,<reboot_mode2>[,...]]
See arch/*/kernel/reboot.c or arch/*/kernel/process.c
--- linux.orig/block/blk-core.c 2010-02-21 22:41:29.000000000 +0800
+++ linux/block/blk-core.c 2010-02-21 22:41:30.000000000 +0800
@@ -498,8 +498,7 @@ struct request_queue *blk_alloc_queue_no

q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
q->backing_dev_info.unplug_io_data = q;
- q->backing_dev_info.ra_pages =
- (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
+ q->backing_dev_info.ra_pages = default_backing_dev_info.ra_pages;
q->backing_dev_info.state = 0;
q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY;
q->backing_dev_info.name = "block";
--- linux.orig/fs/fuse/inode.c 2010-02-21 22:41:29.000000000 +0800
+++ linux/fs/fuse/inode.c 2010-02-21 22:41:30.000000000 +0800
@@ -870,7 +870,7 @@ static int fuse_bdi_init(struct fuse_con
int err;

fc->bdi.name = "fuse";
- fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
+ fc->bdi.ra_pages = default_backing_dev_info.ra_pages;
fc->bdi.unplug_io_fn = default_unplug_io_fn;
/* fuse does it's own writeback accounting */
fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB;
--- linux.orig/mm/readahead.c 2010-02-21 22:41:29.000000000 +0800
+++ linux/mm/readahead.c 2010-02-21 22:42:15.000000000 +0800
@@ -19,6 +19,28 @@
#include <linux/pagevec.h>
#include <linux/pagemap.h>

+static int __init config_readahead_size(char *str)
+{
+ unsigned long bytes;
+
+ if (!str)
+ return -EINVAL;
+ bytes = memparse(str, &str);
+ if (*str != '\0')
+ return -EINVAL;
+
+ if (bytes) {
+ if (bytes < PAGE_CACHE_SIZE) /* missed 'k'/'m' suffixes? */
+ return -EINVAL;
+ if (bytes > 128 << 20) /* limit to 128MB */
+ bytes = 128 << 20;
+ }
+
+ default_backing_dev_info.ra_pages = bytes / PAGE_CACHE_SIZE;
+ return 0;
+}
+early_param("readahead", config_readahead_size);
+
/*
* Initialise a struct file's readahead state. Assumes that the caller has
* memset *ra to zero.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christian Ehrhardt on


Wu Fengguang wrote:
> Nikanth,
>
>> I didn't want to impose artificial restrictions. I think Wu's patch set would
>> be adding some restrictions, like minimum readahead. He could fix it when he
>> modifies the patch to include in his patch set.
>
> OK, I imposed a larger bound -- 128MB.
> And values 1-4095 (more exactly: PAGE_CACHE_SIZE) are prohibited mainly to
> catch "readahead=128" where the user really means to do 128 _KB_ readahead.
>
> Christian, with this patch and more patches to scale down readahead
> size on small memory/device size, I guess it's no longer necessary to
> introduce a CONFIG_READAHEAD_SIZE?

Yes as I mentioned before a kernel parameter supersedes a config symbol
in my opinion too.
-> agreed

> Thanks,
> Fengguang
> ---

--

Gr�sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Sun, Feb 21, 2010 at 10:26:00PM +0800, Wu Fengguang wrote:
> Nikanth,
>
> > > > + readahead= Default readahead value for block devices.
> > > > +
> > >
> > > I think the description should define the units (kb) and valid value
> > > ranges e.g. page size to something not excessive - say 65536kb. The
> > > above description is, IMO, useless without refering to the source to
> > > find out this information....
> > >
> >
> > The parameter can be specified with/without any suffix(k/m/g) that memparse()
> > helper function can accept. So it can take 1M, 1024k, 1050620. I checked other
> > parameters that use memparse() to get similar values and they didn't document
> > it. May be this should be described here.
>
> Hope this helps clarify things to user:
>
> + readahead=nn[KM]
> + Default max readahead size for block devices.
> + Range: 0; 4k - 128m

Yes, that is exactly what I was thinĸing of. Thanks.

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/