memblock: Add memblock_find_in

Prev: [PATCH] gen_nand: Fix support for multiple chips
Next: mmotm 2010-07-27 - WARNING in ec_parse_io_ports()

From: Benjamin Herrenschmidt on 28 Jul 2010 01:40

On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> it is a wrapper for memblock_find_base
>
> make it more easy for x86 to use memblock. ( rebase )
> x86 early_res is using find/reserve pattern instead of alloc.
>
> keep it in weak version, so later We can use x86 own version if needed.
> also We need it in mm/memblock.c, so one caller mm/page_alloc.c could get compiled
>
> -v2: Change name to memblock_find_in_range() according to Michael Ellerman
> -v3: Add generic weak version __memblock_find_in_range()
> so keep the path for fallback to x86 version that handle from low
> -v4: use 0 for failing path
> -v5: use MEMBLOCK_ERROR again
> -v6: remove __memblock_find_in_range()

It's very gross to have this weak and not memblock_find_base()... IE.

You create a new function defined as a wrapper on an existing one to
provide an easier set of arguments ... but also make it weak so the
arch can completely change its semantics without actually changing
the semantics of the function it wraps.

This is going to cause confusion and bugs. I'm adding the patch without
the weak bit to my branch for now, we need to discuss what is the best
approach for x86 here. Might be to use a different function. I don't
understand yet -why- x86 needs to override it, maybe the right way is
to reserve things more intelligently on x86 ?

In any case, you can always use your own wrapper there if needed

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>
> ---
> include/linux/memblock.h | 2 ++
> mm/memblock.c | 8 ++++++++
> 2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 751a4eb..61b22eb 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -48,6 +48,8 @@ extern struct memblock_region memblock_reserved_init_regions[];
> #define memblock_dbg(fmt, ...) \
> if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>
> +u64 memblock_find_in_range(u64 start, u64 end, u64 size, u64 align);
> +
> extern void __init memblock_init(void);
> extern void __init memblock_analyze(void);
> extern long memblock_add(phys_addr_t base, phys_addr_t size);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7471dac..ca7de91 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -156,6 +156,14 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
> return MEMBLOCK_ERROR;
> }
>
> +/*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
> +{
> + return memblock_find_base(size, align, start, end);
> +}
> +
> static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
> {
> unsigned long i;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Yinghai Lu on 28 Jul 2010 02:10

On 07/27/2010 10:36 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
>> it is a wrapper for memblock_find_base
>>
>> make it more easy for x86 to use memblock. ( rebase )
>> x86 early_res is using find/reserve pattern instead of alloc.
>>
>> keep it in weak version, so later We can use x86 own version if needed.
>> also We need it in mm/memblock.c, so one caller mm/page_alloc.c could get compiled
>>
>> -v2: Change name to memblock_find_in_range() according to Michael Ellerman
>> -v3: Add generic weak version __memblock_find_in_range()
>> so keep the path for fallback to x86 version that handle from low
>> -v4: use 0 for failing path
>> -v5: use MEMBLOCK_ERROR again
>> -v6: remove __memblock_find_in_range()
>
> It's very gross to have this weak and not memblock_find_base()... IE.
>
> You create a new function defined as a wrapper on an existing one to
> provide an easier set of arguments ... but also make it weak so the
> arch can completely change its semantics without actually changing
> the semantics of the function it wraps.
>
> This is going to cause confusion and bugs. I'm adding the patch without
> the weak bit to my branch for now, we need to discuss what is the best
> approach for x86 here. Might be to use a different function. I don't
> understand yet -why- x86 needs to override it, maybe the right way is
> to reserve things more intelligently on x86 ?

again there is a difference between low/high to high/low.

for example:
high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
is used by first kernel for that purpose. that the CARD still use that between two kernels.
second kernel could have crash it try to use those ram.

low/high allocation seems to be safe, second kernel can slip to boot fine.

Thanks

Yinghai

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 28 Jul 2010 02:30

On 07/27/2010 10:36 PM, Benjamin Herrenschmidt wrote:
>
> It's very gross to have this weak and not memblock_find_base()... IE.
>
> You create a new function defined as a wrapper on an existing one to
> provide an easier set of arguments ... but also make it weak so the
> arch can completely change its semantics without actually changing
> the semantics of the function it wraps.
>
> This is going to cause confusion and bugs. I'm adding the patch without
> the weak bit to my branch for now, we need to discuss what is the best
> approach for x86 here. Might be to use a different function. I don't
> understand yet -why- x86 needs to override it, maybe the right way is
> to reserve things more intelligently on x86 ?
>
> In any case, you can always use your own wrapper there if needed
>

I'm really confused by this as well. First of all this only looks like
a prototype which swizzles the argument order, which is a good start for
making problems.

The second thing is that the proposed x86 code seems to do something I
would consider to be a core service, which is find an allocation outside
any reserved region, but it does so by looking at two different data
structures. Instead the logical thing would be to knock a reserved
block out of the available set, so that there is always a data structure
which contains the currently free and available memory. I think the
best thing would be if the same data structure could also handle
reserved memory types (by carrying an attribute), but if that is not
possible, there can be a reserved memblock structure and an available
memblock structure, alternatively (and equivalently) an "all memory"
memblock structure and an available memblock structure, but unavailable
memory should not be in the available structure.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 28 Jul 2010 02:40

On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>
> for example:
> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> is used by first kernel for that purpose. that the CARD still use that between two kernels.
> second kernel could have crash it try to use those ram.
>

Uhm, no. That's a bug in the Qlogic driver not shutting the card down
cleanly. Hacking around that in memory allocation order is braindamaged
in the extreme. kexec *cannot* be safe in any way if we don't shut down
pending DMA, and what you describe above is DMA.

> low/high allocation seems to be safe, second kernel can slip to boot fine.

Low to high is just broken. Low memory is a special, desirable
resource, and we should minimize the use of it.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Yinghai Lu on 28 Jul 2010 03:20

On 07/27/2010 11:38 PM, H. Peter Anvin wrote:
> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>>
>> for example:
>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
>> second kernel could have crash it try to use those ram.
>>
>
> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> cleanly. Hacking around that in memory allocation order is braindamaged
> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> pending DMA, and what you describe above is DMA.
>

the problem is later if the user hit the problem, it will be called "Regression" after bisecting to the memblock/x86 changes.
because low/high does work before.

BTW, that design from qlogic to save log in RAM is not good one, they may save some cents for the ram in card.

other vendors seems put log/trace in the ram on card.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: [PATCH] gen_nand: Fix support for multiple chips
Next: mmotm 2010-07-27 - WARNING in ec_parse_io_ports()

memblock: Add memblock_find_in_range()