From: Dave Hansen on
On Mon, 2010-08-09 at 12:53 -0500, Nathan Fontenot wrote:
> This set of patches de-couples the idea that there is a single
> directory in sysfs for each memory section. The intent of the
> patches is to reduce the number of sysfs directories created to
> resolve a boot-time performance issue. On very large systems
> boot time are getting very long (as seen on powerpc hardware)
> due to the enormous number of sysfs directories being created.
> On a system with 1 TB of memory we create ~63,000 directories.
> For even larger systems boot times are being measured in hours.

Hi Nathan,

The set is looking pretty good to me. We _might_ want to up the ante in
the future and allow it to be even more dynamic than this, but this
looks like a good start to me.

BTW, have you taken a look at what the hotplug events look like if only
a single section (not filling up a whole block) is added?

Feel free to add my:

Acked-by: Dave Hansen <dave(a)linux.vnet.ibm.com>

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Mon, 09 Aug 2010 12:53:00 -0500
Nathan Fontenot <nfont(a)austin.ibm.com> wrote:

> This set of patches de-couples the idea that there is a single
> directory in sysfs for each memory section. The intent of the
> patches is to reduce the number of sysfs directories created to
> resolve a boot-time performance issue. On very large systems
> boot time are getting very long (as seen on powerpc hardware)
> due to the enormous number of sysfs directories being created.
> On a system with 1 TB of memory we create ~63,000 directories.
> For even larger systems boot times are being measured in hours.

And those "hours" are mainly due to this problem, I assume.

> This set of patches allows for each directory created in sysfs
> to cover more than one memory section. The default behavior for
> sysfs directory creation is the same, in that each directory
> represents a single memory section. A new file 'end_phys_index'
> in each directory contains the physical_id of the last memory
> section covered by the directory so that users can easily
> determine the memory section range of a directory.

What you're proposing appears to be a non-back-compatible
userspace-visible change. This is a big issue!

It's not an unresolvable issue, as this is a must-fix problem. But you
should tell us what your proposal is to prevent breakage of existing
installations. A Kconfig option would be good, but a boot-time kernel
command line option which selects the new format would be much better.

However you didn't mention this issue at all, and it's the most
important one.


> Updates for version 5 of the patchset include the following:
>
> Patch 4/8 Add mutex for add/remove of memory blocks
> - Define the mutex using DEFINE_MUTEX macro.
>
> Patch 8/8 Update memory-hotplug documentation
> - Add information concerning memory holes in phys_index..end_phys_index.

And you forgot to tell us how long those machines boot with the
patchset applied, which is the entire point of the patchset!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on
On Thu, 2010-08-12 at 12:08 -0700, Andrew Morton wrote:
> > This set of patches allows for each directory created in sysfs
> > to cover more than one memory section. The default behavior for
> > sysfs directory creation is the same, in that each directory
> > represents a single memory section. A new file 'end_phys_index'
> > in each directory contains the physical_id of the last memory
> > section covered by the directory so that users can easily
> > determine the memory section range of a directory.
>
> What you're proposing appears to be a non-back-compatible
> userspace-visible change. This is a big issue!

Nathan, one thought to get around this at the moment would be to bump up
the size that we export in /sys/devices/system/memory/block_size_bytes.
I think you have already done most of the hard work to accomplish
this.

You can still add the end_phys_index stuff. But, for now, it would
always be equal to start_phys_index.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/