From: Russell King - ARM Linux on
On Thu, Jul 29, 2010 at 02:55:53PM -0500, Christoph Lameter wrote:
> On Thu, 29 Jul 2010, Russell King - ARM Linux wrote:
>
> > And no, setting the sparse section size to 512kB doesn't work - memory is
> > offset by 256MB already, so you need a sparsemem section array of 1024
> > entries just to cover that - with the full 256MB populated, that's 512
> > unused entries followed by 512 used entries. That too is going to waste
> > memory like nobodies business.
>
> SPARSEMEM EXTREME does not handle that?
>
> Some ARMs seem to have MMUs. If so then use SPARSEMEM_VMEMMAP. You can map
> 4k pages for the mmap through a page table. Redirect unused 4k blocks to
> the NULL page.

We're going over old ground which has already been covered in this very
thread. I've no compunction to repeat the arguments.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Russell King - ARM Linux on
On Thu, Jul 29, 2010 at 01:55:19PM -0700, Dave Hansen wrote:
> Could you give some full examples of how the memory is laid out on these
> systems? I'm having a bit of a hard time visualizing it.

In the example I quote, there are four banks of memory, which start at
0x10000000, 0x14000000, 0x18000000 and 0x1c000000 physical, which can
be populated or empty, each one in multiples of 512KB up to the maximum
64MB.

There are other systems where memory starts at 0xc0000000 and 0xc8000000
physical, and the memory size is either 32MB or 64MB.

We also have one class of systems where memory starts at 0xc0000000,
0xc1000000, 0xc2000000, etc - but I don't know what the minimum
populated memory size in any one region is.

Things that we've tried over the years:
1. flatmem, remapping memory into one contiguous chunk (which can cause
problems when parts of the kernel assume that the underlying phys
space is contiguous.)
2. flatmem with holes and a 1:1 v:p mapping (was told we shouldn't be
doing this - and it becomes impossible with sparsely populated banks
of memory split over a large range.)
3. discontigmem (was told this was too heavy, we're not NUMA, we shouldn't
be using this, and it will be deprecated, use sparsemem instead)
4. sparsemem

What we need is something which allows us to handle memory scattered
in several regions of the physical memory map, each bank being a
variable size.

From what I've seen through this thread, there is no support for such
a setup. (People seem to have their opinions on this, and will tell
you what you should be using, only for someone else to tell you that
you shouldn't be using that! - *) This isn't something new for ARM,
we've had these kinds of issues for the last 10 or more years.

What is new is that we're now seeing systems where the first bank of
memory to be populated is at a higher physical address than the second
bank, and therefore people are setting up v:p mappings which switch the
ordering of these - but this I think is unrelated to the discussion at
hand.

* - this is why I'm exasperated with this latest discussion on it.

While we're here, I'll repeat a point made earlier.

We don't map lowmem in using 4K pages. That would be utter madness
given the small TLB size ARM processors tend to have. Instead, we
map lowmem using 1MB section mappings (which occupy one entry in the
L1 page table.) Modifying these mappings requires all page tables
in the system to be updated - which given that we're SMP etc. now
is not practical.

So the idea that we can remap a section of memory for the mem_map
struct (as suggested several times in this thread) isn't possible
without having it allocated in something like vmalloc space.
Plus, of course, that if you did such a remapping in the lowmem
mapping, the pages which were there become unusable as they lose
their virtual mapping (thereby causing phys_to_virt/virt_to_phys
on their addresses to break.) Therefore, you only gain even more
problems by this method.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Thu, 29 Jul 2010, Russell King - ARM Linux wrote:

> We don't map lowmem in using 4K pages. That would be utter madness
> given the small TLB size ARM processors tend to have. Instead, we
> map lowmem using 1MB section mappings (which occupy one entry in the
> L1 page table.) Modifying these mappings requires all page tables
> in the system to be updated - which given that we're SMP etc. now
> is not practical.
>
> So the idea that we can remap a section of memory for the mem_map
> struct (as suggested several times in this thread) isn't possible
> without having it allocated in something like vmalloc space.
> Plus, of course, that if you did such a remapping in the lowmem
> mapping, the pages which were there become unusable as they lose
> their virtual mapping (thereby causing phys_to_virt/virt_to_phys
> on their addresses to break.) Therefore, you only gain even more
> problems by this method.

A 1M page dedicated to vmemmap would only be used for memmap and only be
addressed using the virtual memory address. The pfn to page and vice versa
mapping that is the basic mechamism for virt_to_page and friends is then
straightforward. Nothing breaks.

memory-model.h:
#elif defined(CONFIG_SPARSEMEM_VMEMMAP)

/* memmap is virtually contiguous. */
#define __pfn_to_page(pfn) (vmemmap + (pfn))
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)


However, if you have such a sparse address space you would not want 1M
blocks for memmap but rather 4k pages. So yes you would need to use
vmalloc space (or reserve another virtual range for that purpose).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Hansen on
On Thu, 2010-07-29 at 23:14 +0100, Russell King - ARM Linux wrote:
> What we need is something which allows us to handle memory scattered
> in several regions of the physical memory map, each bank being a
> variable size.

Russell, it does sound like you have a pretty pathological case here. :)
It's not one that we've really attempted to address on any other
architectures.

Just to spell it out, if you have 4GB of physical address space, with
512k sections, you need 8192 sections, which means 8192*8 bytes, so it'd
eat 64k of memory. That's the normal SPARSEMEM case.

SPARSEMEM_EXTREME would be a bit different. It's a 2-level lookup.
You'd have 16 "section roots", each representing 256MB of address space.
Each time we put memory under one of those roots, we'd fill in a
512-section second-level table, which is designed to always fit into one
page. If you start at 256MB, you won't waste all those entries.

The disadvantage of SPARSEMEM_EXTREME is that it costs you the extra
level in the lookup. The space loss in arm's case would only be 16
pointers, which would more than be made up for by the other gains.

The other case where it really makes no sense is when you're populating
a single (or small number) of sections, evenly across the address space.
For instance, let's say you have 16 512k banks, evenly spaced at 256MB
intervals:

512k(a)0x00000000
512k(a)0x10000000
512k(a)0x20000000
...
512k(a)0xF0000000

If you use SPARSEMEM_EXTREME on that it will degenerate to having the
same memory consumption as classic SPARSEMEM, along with the extra
lookup of EXTREME. But, I haven't heard you say that you have this kind
of configuration, yet. :)

SPARSEMEM_EXTREME is really easy to test. You just have to set it in
your .config. To get much use out of it, you'd also need to make the
SECTION_SIZE, like the 512k we were talking about.


-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, Jul 30, 2010 at 5:55 AM, Dave Hansen <dave(a)linux.vnet.ibm.com> wrote:
> On Thu, 2010-07-29 at 19:33 +0100, Russell King - ARM Linux wrote:
>> And no, setting the sparse section size to 512kB doesn't work - memory is
>> offset by 256MB already, so you need a sparsemem section array of 1024
>> entries just to cover that - with the full 256MB populated, that's 512
>> unused entries followed by 512 used entries. �That too is going to waste
>> memory like nobodies business.
>
> Sparsemem could use some work in the case where memory doesn't start at
> 0x0. �But, it doesn't seem like it would be _too_ oppressive to add.
> It's literally just adding an offset to all of the places where a
> physical address is stuck into the system. �It'll make a few of the
> calculations longer, of course, but it should be manageable.
>
> Could you give some full examples of how the memory is laid out on these
> systems? �I'm having a bit of a hard time visualizing it.
>
> As Christoph mentioned, SPARSEMEM_EXTREME might be viable here, too.
>
> If you free up parts of the mem_map[] array, how does the buddy
> allocator still work? �I thought we required at 'struct page's to be
> contiguous and present for at least 2^MAX_ORDER-1 pages in one go.

I think in that case, arch should define CONFIG_HOLES_IN_ZONE to prevent
crash. But I am not sure hole architectures on ARM have been used it well.
Kujkin's problem happens not buddy but walking whole pfn to echo
min_free_kbytes.

>
> -- Dave
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/