From: Cong Wang on
On 07/22/10 14:28, Eric W. Biederman wrote:
> Amerigo Wang<amwang(a)redhat.com> writes:
>
>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with
>> many memory ranges. Increase this hard limit to 1024 which is reasonably large,
>> and change ->segment from a static array to a dynamically allocated memory.
>
> ???
>
> This should be about segments in the executable being loaded. What
> executable has one segment for each range of physical memory?
>
> Not that generalizing this is a bad idea but with a comment that
> seems entirely wrong I am wondering what the problem really is.
>

Ah, I think Neil should explain this.

He made a patch which includes many memory ranges, caused kexec
fails to load the kernel. Increasing this limit and the corresponding
one in kexec-tools fixes the problem. His patch is not in upstream
kexec-tools, AFAIK.

However, even if we don't consider that patch, isn't 16 too small too?

Thanks.

--
The opposite of love is not hate, it's indifference.
- Elie Wiesel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric W. Biederman on
huang ying <huang.ying.caritas(a)gmail.com> writes:

> On Thu, Jul 22, 2010 at 3:08 PM, Eric W. Biederman
> <ebiederm(a)xmission.com> wrote:
>> Cong Wang <amwang(a)redhat.com> writes:
>>
>>> On 07/22/10 14:28, Eric W. Biederman wrote:
>>>> Amerigo Wang<amwang(a)redhat.com>  writes:
>>>>
>>>>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with
>>>>> many memory ranges. Increase this hard limit to 1024 which is reasonably large,
>>>>> and change ->segment from a static array to a dynamically allocated memory.
>>>>
>>>> ???
>>>>
>>>> This should be about segments in the executable being loaded.  What
>>>> executable has one segment for each range of physical memory?
>>>>
>>>> Not that generalizing this is a bad idea but with a comment that
>>>> seems entirely wrong I am wondering what the problem really is.
>>>>
>>>
>>> Ah, I think Neil should explain this.
>>>
>>> He made a patch which includes many memory ranges, caused kexec
>>> fails to load the kernel. Increasing this limit and the corresponding
>>> one in kexec-tools fixes the problem. His patch is not in upstream
>>> kexec-tools, AFAIK.
>>>
>>> However, even if we don't consider that patch, isn't 16 too small too?
>>
>> Generally you just need one physical hunk for the code, maybe a second
>> for the initrd.
>>
>> It is perfectly fine to raise the number of segments as it doesn't
>> affect the ABI, but it wants a good explanation of what kind of weird
>> application wants to write to all over memory when it is loaded.
>
> kexec can be used to load not only the kernel images, but also more
> complex images such as hibernation image. So I think it is good to
> raise the number of segments.

Totally reasonable.

And in all fairness the patch does a good job of raising the limit.

However if that is the goal 1024 is probably a bit low as I believe
SGI has built machines with that many nodes. Still after the patch
under discussion 1024 was only a limit in a header file so it can
be trivially changed.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
On 07/26/10 20:19, Eric W. Biederman wrote:
> Cong Wang<amwang(a)redhat.com> writes:
>
>> So, what is a better number? 2048? :)
>
> I was thinking something a little ridiculous like 10K or 64K.
>
> Assuming the usage you care about is something like hibernate on a
> machine with disjoint memory where you truly need one segment for
> each memory region.
>
> The only point of a limit at all once we introduce dynamic allocation
> is to catch buggy apps.
>

Ok, I will change that number and improve the changelog.

Thanks.


--
The opposite of love is not hate, it's indifference.
- Elie Wiesel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
On 07/27/10 18:00, Milton Miller wrote:
> [ Added kexec at lists.infradead.org and linuxppc-dev(a)lists.ozlabs.org ]
>
>>
>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with
>> many memory ranges. When hibernate on a machine with disjoint memory we do
>> need one segment for each memory region. Increase this hard limit to 16K
>> which is reasonably large.
>>
>> And change ->segment from a static array to a dynamically allocated memory.
>>
>> Cc: Neil Horman<nhorman(a)redhat.com>
>> Cc: huang ying<huang.ying.caritas(a)gmail.com>
>> Cc: Eric W. Biederman<ebiederm(a)xmission.com>
>> Signed-off-by: WANG Cong<amwang(a)redhat.com>
>>
>> ---
>> diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
>> index ed31a29..f115585 100644
>> --- a/arch/powerpc/kernel/machine_kexec_64.c
>> +++ b/arch/powerpc/kernel/machine_kexec_64.c
>> @@ -131,10 +131,7 @@ static void copy_segments(unsigned long ind)
>> void kexec_copy_flush(struct kimage *image)
>> {
>> long i, nr_segments = image->nr_segments;
>> - struct kexec_segment ranges[KEXEC_SEGMENT_MAX];
>> -
>> - /* save the ranges on the stack to efficiently flush the icache */
>> - memcpy(ranges, image->segment, sizeof(ranges));
>> + struct kexec_segment range;
>
> I'm glad you found our copy on the stack and removed the stack overflow
> that comes with this bump, but ...
>
>>
>> /*
>> * After this call we may not use anything allocated in dynamic
>> @@ -148,9 +145,11 @@ void kexec_copy_flush(struct kimage *image)
>> * we need to clear the icache for all dest pages sometime,
>> * including ones that were in place on the original copy
>> */
>> - for (i = 0; i< nr_segments; i++)
>> - flush_icache_range((unsigned long)__va(ranges[i].mem),
>> - (unsigned long)__va(ranges[i].mem + ranges[i].memsz));
>> + for (i = 0; i< nr_segments; i++) {
>> + memcpy(&range,&image->segment[i], sizeof(range));
>> + flush_icache_range((unsigned long)__va(range.mem),
>> + (unsigned long)__va(range.mem + range.memsz));
>> + }
>> }
>
> This is executed after the copy, so as it says,
> "we may not use anything allocated in dynamic memory".
>
> We could allocate control pages to copy the segment list into.
> Actually ppc64 doesn't use the existing control page, but that
> is only 4kB today.
>
> We need the list to icache flush all the pages in all the segments.
> The as the indirect list doesn't have pages that were allocated at
> their destination.
>
> Or maybe the icache flush should be done in the generic code
> like it does for crash load segments?
>

I don't get the point here, according to the comments,
it is copied into stack because of efficiency.

--
The opposite of love is not hate, it's indifference.
- Elie Wiesel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
(Ping Milton...)

On 07/29/10 14:42, Cong Wang wrote:
> On 07/27/10 18:00, Milton Miller wrote:
>> [ Added kexec at lists.infradead.org and linuxppc-dev(a)lists.ozlabs.org ]
>>
>>>
>>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine
>>> with
>>> many memory ranges. When hibernate on a machine with disjoint memory
>>> we do
>>> need one segment for each memory region. Increase this hard limit to 16K
>>> which is reasonably large.
>>>
>>> And change ->segment from a static array to a dynamically allocated
>>> memory.
>>>
>>> Cc: Neil Horman<nhorman(a)redhat.com>
>>> Cc: huang ying<huang.ying.caritas(a)gmail.com>
>>> Cc: Eric W. Biederman<ebiederm(a)xmission.com>
>>> Signed-off-by: WANG Cong<amwang(a)redhat.com>
>>>
>>> ---
>>> diff --git a/arch/powerpc/kernel/machine_kexec_64.c
>>> b/arch/powerpc/kernel/machine_kexec_64.c
>>> index ed31a29..f115585 100644
>>> --- a/arch/powerpc/kernel/machine_kexec_64.c
>>> +++ b/arch/powerpc/kernel/machine_kexec_64.c
>>> @@ -131,10 +131,7 @@ static void copy_segments(unsigned long ind)
>>> void kexec_copy_flush(struct kimage *image)
>>> {
>>> long i, nr_segments = image->nr_segments;
>>> - struct kexec_segment ranges[KEXEC_SEGMENT_MAX];
>>> -
>>> - /* save the ranges on the stack to efficiently flush the icache */
>>> - memcpy(ranges, image->segment, sizeof(ranges));
>>> + struct kexec_segment range;
>>
>> I'm glad you found our copy on the stack and removed the stack overflow
>> that comes with this bump, but ...
>>
>>>
>>> /*
>>> * After this call we may not use anything allocated in dynamic
>>> @@ -148,9 +145,11 @@ void kexec_copy_flush(struct kimage *image)
>>> * we need to clear the icache for all dest pages sometime,
>>> * including ones that were in place on the original copy
>>> */
>>> - for (i = 0; i< nr_segments; i++)
>>> - flush_icache_range((unsigned long)__va(ranges[i].mem),
>>> - (unsigned long)__va(ranges[i].mem + ranges[i].memsz));
>>> + for (i = 0; i< nr_segments; i++) {
>>> + memcpy(&range,&image->segment[i], sizeof(range));
>>> + flush_icache_range((unsigned long)__va(range.mem),
>>> + (unsigned long)__va(range.mem + range.memsz));
>>> + }
>>> }
>>
>> This is executed after the copy, so as it says,
>> "we may not use anything allocated in dynamic memory".
>>
>> We could allocate control pages to copy the segment list into.
>> Actually ppc64 doesn't use the existing control page, but that
>> is only 4kB today.
>>
>> We need the list to icache flush all the pages in all the segments.
>> The as the indirect list doesn't have pages that were allocated at
>> their destination.
>>
>> Or maybe the icache flush should be done in the generic code
>> like it does for crash load segments?
>>
>
> I don't get the point here, according to the comments,
> it is copied into stack because of efficiency.
>


--
The opposite of love is not hate, it's indifference.
- Elie Wiesel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/