From: Yinghai Lu on
>
> That action means you absolutely don't value our feedback at all.

[PATCH 01/20] x86: add find_e820_area_node
is addressing your concern that early_res didn't handle memory cross the nodes problem.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
On 03/21/2010 10:12 PM, Benjamin Herrenschmidt wrote:
> It -may- well be that adapting x86 to lmb isn't a practical approach,
> but if that was the case, then please justify why with precise technical
> reasons, which we can discuss then in details and make a decision based
> on that.

1. lmb is merging region when you add one new reserved region.
early_res doesn't do that merge. so later it could figure wrong freeing.
<recently add free_early_partial, for per cpu setup only>
2. mem type in e820 map has more than RAM, it include RAM, reserved, ACPI, acpi nvs, and type 9?, and KERN_RESERVED...
3. early res, every range has one name tag.
4. early_res is array based, and it could auto double the array size and copy the old one to new one. and first entry in new array is for array itself.

if want x86 to use lmb, the e820 map and the lmb.memory are duplicated.
also need to have lmb.memory to support more type, otherwise still need to go back to check e820 about e820 reserved etc.


Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

( Cc:-ed Andrew and Linus as this is a general design/policy matter wrt.
memory management. )

* David Miller <davem(a)davemloft.net> wrote:

> From: Yinghai Lu <yinghai(a)kernel.org>
> Date: Sun, 21 Mar 2010 21:28:38 -0700
>
> >>
> >> That action means you absolutely don't value our feedback at all.
> >
> > [PATCH 01/20] x86: add find_e820_area_node
> > is addressing your concern that early_res didn't handle memory cross the nodes problem.
>
> Now I know that you _REALLY_ aren't listening to us.

[ He has done a bit more than just to simply listen: he seems to have written
a patch which he thinks is addressing the concerns you pointed out. It might
not be the response you wished for (and it might be inadequate) for but it
sure gives me the impression of him listening to you - unless by 'listening'
you mean 'follow my exact opinion without argument'. ]

> We said to use LMB because 1) it already exists 2) many platforms have been
> using it for years and 3) it doesn't lack the features you're now having to
> add to e820.

The thing is, lib/lmb.c was librarized two years ago by you (much after
early_res has been written for x86), but was not properly integrated into the
core kernel nor into x86. It was first suggested by you in the early_res
context about ten days ago, when Yinghai started posting Sparc64 patches.

Which is about half a year after the whole very difficult early_res/bootmem
work was started by Yinghai :-(

I dont mind LMB per se, logically it seems quite similar to the early_res bits
Yinghai has generalized (to a certain degree), and is quite a bit cleaner as
you are writing very clean code.

Note the other side of the coin: LMB appears to be deployed on only 4 non-x86
architectures that muster ~1% of the Linux boxes while early_res is deployed
on more than 95%.

So there's a very real hardship of testing and conversion here that we cannot
ignore and an even better path may be to gradually transform the more tested
and more deployed early_res code to meet the interface details of LMB.

Please also realize the difficulties Yinghai has gone through already:
converting and generalizing _all_ of the x86 early allocation code to a more
generic core kernel approach, with essentially zero interest from _any_
non-x86 person ...

Those early_res patches were posted all over on lkml, it was literally
hundreds of difficult patches, and now, months down the line, after we've
tested and upstreamed it (with many nasty regressions fixed on x86 during the
development of it) you come with a rigid "do it some other way, convert all of
x86 over again or else" position.

I really wish non-x86 architectures apprecitated (and helped) the core kernel
work x86 is doing, because it is subsidizing non-x86 architectures all the
time.

For example when LMB was plopped into lib/lmb.c in 2008 why was it not ported
to x86, our most popular architecture? Did you consider posting LMB patches
for x86 instead of expecting Yinghai to post Sparc64, PowerPC, SH and
Microblaze patches?

Anyway, i'm sure we can work out an approach, and yes, LMB looks pretty good
and could be picked up if it can be done gradually - given some mutual
willingness to work on this as equals.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Paul Mackerras <paulus(a)samba.org> wrote:

> And I don't see the point of moving the x86 e820 stuff into the kernel
> directory. [...]

I dont see the point of that either - that is a mistake. e820 is an x86 bios
call and we shouldnt name a generic mechanism after that. e820 is absolutely
messy and has no place anywhere beyond x86.

The main technical argument i see is 'early_res versus LMB'. Even there i'd
prefer LMB from a technical quality POV.

> Well I personally don't mind if x86 uses early_res or whatever other code in
> arch/x86 to handle the problems that arise from deficient firmware. I just
> don't see any value in converting powerpc or sparc64 over to using ~2000
> lines of early_res/fw_memmap code where the existing ~500 lines of lmb code
> is working just fine.

Lets put it this way then: do you see any point in PowerPC making use of a 10+
million lines of code kernel that is being mainly (80%+) financed, developed,
tested and deployed by people who care about x86 mostly?

If yes then it seems like a pretty damn good deal to me for PowerPC to go
beyond its narrow short-term self-interest and work towards generalizations
more actively, and even consider touching its 500 lines of lmb code ...

I dont know how many times we've accomodated for non-x86 architectures in
various pieces of kernel code.

Obviously if there's bloat affecting PowerPC then that can be addressed via
technical measures. But we really shouldnt leave the slightly incompatible
early allocators in place. (we shouldnt have let them get created in the first
place, but that is water down the bridge.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Benjamin Herrenschmidt <benh(a)kernel.crashing.org> wrote:

> On Mon, 2010-03-22 at 14:05 +0100, Ingo Molnar wrote:
> > * Paul Mackerras <paulus(a)samba.org> wrote:
> >
> > > And I don't see the point of moving the x86 e820 stuff into the kernel
> > > directory. [...]
> >
> > I dont see the point of that either - that is a mistake. e820 is an x86 bios
> > call and we shouldnt name a generic mechanism after that. e820 is absolutely
> > messy and has no place anywhere beyond x86.
> >
> > The main technical argument i see is 'early_res versus LMB'. Even there i'd
> > prefer LMB from a technical quality POV.
>
> Then we have no argument. The point is, we object to that fw_memmap/e820
> stuff taking over for non-x86 architectures. We aren't saying that x86
> -must- move to LMB, but if the wish is to have a common implementation in
> generic code accross all archs, -then- we object to it being e820.

Ok, just in case i wasnt clear enough in my first reply (and i guess your mail
means i wasnt): that whole-sale move of e820 into kernel/fw_memmap.c is a
total non-starter as far as i'm concerned.

And i kind of like the 'logical memory block' name - it is more intuitive than
'early_res' (which was always a misnomer IMO, just couldnt find a better name
for it and it stuck with us).

So no arguments from me at all about the code quality aspects - i just wanted
to highlight the huge amount of non-trivial work Yinghai has invested into
this already, with little external help, and that if possible it would be nice
to minimize the upsetting of related x86 code if possible. Please help him out
with more specific suggestions about how the two memory allocation spaces
could be unified best, to serve the needs of all these architectures - if you
have some spare time.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/