From: Minchan Kim on
On Fri, Jul 30, 2010 at 9:38 AM, Dave Hansen <dave(a)linux.vnet.ibm.com> wrote:
> On Thu, 2010-07-29 at 23:14 +0100, Russell King - ARM Linux wrote:
>> What we need is something which allows us to handle memory scattered
>> in several regions of the physical memory map, each bank being a
>> variable size.
>
> Russell, it does sound like you have a pretty pathological case here. :)
> It's not one that we've really attempted to address on any other
> architectures.
>
> Just to spell it out, if you have 4GB of physical address space, with
> 512k sections, you need 8192 sections, which means 8192*8 bytes, so it'd
> eat 64k of memory. �That's the normal SPARSEMEM case.
>
> SPARSEMEM_EXTREME would be a bit different. �It's a 2-level lookup.
> You'd have 16 "section roots", each representing 256MB of address space.
> Each time we put memory under one of those roots, we'd fill in a
> 512-section second-level table, which is designed to always fit into one
> page. �If you start at 256MB, you won't waste all those entries.
>
> The disadvantage of SPARSEMEM_EXTREME is that it costs you the extra
> level in the lookup. �The space loss in arm's case would only be 16
> pointers, which would more than be made up for by the other gains.
>
> The other case where it really makes no sense is when you're populating
> a single (or small number) of sections, evenly across the address space.
> For instance, let's say you have 16 512k banks, evenly spaced at 256MB
> intervals:
>
> � � � �512k(a)0x00000000
> � � � �512k(a)0x10000000
> � � � �512k(a)0x20000000
> � � � �...
> � � � �512k(a)0xF0000000
>
> If you use SPARSEMEM_EXTREME on that it will degenerate to having the
> same memory consumption as classic SPARSEMEM, along with the extra
> lookup of EXTREME. �But, I haven't heard you say that you have this kind
> of configuration, yet. :)
>
> SPARSEMEM_EXTREME is really easy to test. �You just have to set it in
> your .config. �To get much use out of it, you'd also need to make the
> SECTION_SIZE, like the 512k we were talking about.
>

Thanks for good explanation.
When this problem happened, I suggested to use section size 16M.
The space isn't a big cost but failed since Russell doesn't like it.

So I tried to enhance sparsemem to support hole but you guys doesn't like it.
Frankly speaking myself don't like this approach but I think whoever
have to care of the problem.

Hmm, Is it better to give up Samsung's good embedded board?
It depends on Russell's opinion.

I will hold this patch until reaching the conclusion of controversial
discussion.
Thanks, Dave.

> -- Dave
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Thu, 29 Jul 2010, Dave Hansen wrote:

> SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> You'd have 16 "section roots", each representing 256MB of address space.
> Each time we put memory under one of those roots, we'd fill in a
> 512-section second-level table, which is designed to always fit into one
> page. If you start at 256MB, you won't waste all those entries.

That is certain a solution to the !MMU case and it would work very much
like a page table. If you have an MMU then the vmemmap sparsemem
configuration can take advantage of of that to avoid the 2 level lookup.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on
On Fri, 2010-07-30 at 07:48 -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Dave Hansen wrote:
>
> > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> > You'd have 16 "section roots", each representing 256MB of address space.
> > Each time we put memory under one of those roots, we'd fill in a
> > 512-section second-level table, which is designed to always fit into one
> > page. If you start at 256MB, you won't waste all those entries.
>
> That is certain a solution to the !MMU case and it would work very much
> like a page table. If you have an MMU then the vmemmap sparsemem
> configuration can take advantage of of that to avoid the 2 level lookup.

Yup, couldn't agree more, Christoph.

It wouldn't hurt to have several them available on ARM since the
architecture is so diverse.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Fri, Jul 30, 2010 at 06:32:04PM +0900, Minchan Kim wrote:
> On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <dave(a)linux.vnet.ibm.com> wrote:
> > If you free up parts of the mem_map[] array, how does the buddy
> > allocator still work? �I thought we required at 'struct page's to be
> > contiguous and present for at least 2^MAX_ORDER-1 pages in one go.

(Dave, I don't seem to have your mail to reply to.)

What you say is correct, and memory banks as a rule of thumb tend to be
powers of two.

We do have the ability to change MAX_ORDER (which we need to do for some
platforms where there's only 1MB of DMA-able memory.)

However, in the case of two 512KB banks, the buddy allocator won't try
to satisfy a 1MB request as it'll only have two separate 2x512K free
'pages' to deal with, and 0x1M free 'pages'.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Fri, Jul 30, 2010 at 07:48:00AM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Dave Hansen wrote:
>
> > SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
> > You'd have 16 "section roots", each representing 256MB of address space.
> > Each time we put memory under one of those roots, we'd fill in a
> > 512-section second-level table, which is designed to always fit into one
> > page. If you start at 256MB, you won't waste all those entries.
>
> That is certain a solution to the !MMU case and it would work very much
> like a page table. If you have an MMU then the vmemmap sparsemem
> configuration can take advantage of of that to avoid the 2 level lookup.

Looking at vmemmap sparsemem, we need to fix it as the page table
allocation in there bypasses the arch defined page table setup.

This causes a problem if you have 256-entry L2 page tables with no
room for the additional Linux VM PTE support bits (such as young,
dirty, etc), and need to glue two 256-entry L2 hardware page tables
plus a Linux version to store its accounting in each page. See
arch/arm/include/asm/pgalloc.h.

So this causes a problem with vmemmap:

pte_t entry;
void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);

Are you willing for this stuff to be replaced by architectures as
necessary?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/