From: KAMEZAWA Hiroyuki on
On Tue, 20 Apr 2010 12:58:43 +0900
Minchan Kim <minchan.kim(a)gmail.com> wrote:

> On Tue, Apr 20, 2010 at 12:07 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
> > On Tue, 20 Apr 2010 11:39:46 +0900
> >> Dumb question. Why can't we call arch_alloc_page and kernel_map_pages
> >> as interrupt disabled? It's deadlock issue or latency issue?
> >> I don't found any comment about it.
> >> It should have added the comment around that functions. :)
> >>
> >
> > I guess it's from the same reason as vfree(), which can't be called under
> > irq-disabled.
> >
> > Both of them has to flush TLB of all cpus. At flushing TLB (of other cpus), cpus has
> > to send IPI via smp_call_function. What I know from old stories is below.
> >
> > At sendinf IPI, usual sequence is following. (This may be old.)
> >
> >        spin_lock(&ipi_lock);
> >                set up cpu mask for getting notification from other cpu for declearing
> >                "I received IPI and finished my own work".
> >        spin_unlock(&ipi_lock);
> >
> > Then,
> >          CPU0                             CPU1
> >
> >    irq_disable (somewhere)             spin_lock
> >                                        send IPI and wait for notification.
> >    spin_lock()
> >
> > deadlock.  Seeing decription of kernel/smp.c::smp_call_function_many(), it says
> > this function should not be called under irq-disabled.
> > (Maybe the same kind of spin-wait deadlock can happen.)
> >
>
> Thanks for kind explanation.
> Actually I guessed TLB issue but I can't find any glue point which
> connect tlb flush to smp_call_function_xxx. :(
>
> Now look at the __native_flush_tlb_global.
> It just read and write cr4 with just mask off X86_CR4_PGE.
> So i don't know how connect this and smp_schedule_xxxx.
> Hmm,, maybe APIC?
>
> Sorry for dumb question.
>
Hmm...seeing again,

arch/x86/mm/pageattr.c::kernel_map_pages() says:

1293 /*
1294 * We should perform an IPI and flush all tlbs,
1295 * but that can deadlock->flush only current cpu:
1296 */

Wow. It just flush only local cpu. Then, no IPI.

Hmm...all other archs does the same thing ? If so, kernel_map_pages()
can be called under irq_disabled. The author of kernel_map_pages()
is aware that this can be called under irq-disabled.

Hmm...

Thanks,
-Kame






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Apr 20, 2010 at 11:39:46AM +0900, Minchan Kim wrote:
> On Tue, Apr 20, 2010 at 4:39 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> > On Mon, Apr 19, 2010 at 07:14:42PM +0100, Mel Gorman wrote:
> >> On Mon, Apr 19, 2010 at 07:01:33PM +0900, KAMEZAWA Hiroyuki wrote:
> >> >
> >> > mmotm 2010-04-15-14-42
> >> >
> >> > When I tried
> >> > �# echo 0 > /proc/sys/vm/compaction
> >> >
> >> > I see following.
> >> >
> >> > My enviroment was
> >> > � 2.6.34-rc4-mm1+ (2010-04-15-14-42) (x86-64) CPUx8
> >> > � allocating tons of hugepages and reduce free memory.
> >> >
> >> > What I did was:
> >> > � # echo 0 > /proc/sys/vm/compact_memory
> >> >
> >> > Hmm, I see this kind of error at migation for the 1st time..
> >> > my.config is attached. Hmm... ?
> >> >
> >> > (I'm sorry I'll be offline soon.)
> >>
> >> That's ok, thanks you for the report. I'm afraid I made little progress
> >> as I spent most of the day on other bugs but I do have something for
> >> you.
> >>
> >> First, I reproduced the problem using your .config. However, the problem does
> >> not manifest with the .config I normally use which is derived from the distro
> >> kernel configuration (Debian Lenny). So, there is something in your .config
> >> that triggers the problem. I very strongly suspect this is an interaction
> >> between migration, compaction and page allocation debug.
> >
> > I unexpecedly had the time to dig into this. Does the following patch fix
> > your problem? It Worked For Me.
>
> Nice catch during shot time. Below is comment.
>
> >
> > ==== CUT HERE ====
> > mm,compaction: Map free pages in the address space after they get split for compaction
> >
> > split_free_page() is a helper function which takes a free page from the
> > buddy lists and splits it into order-0 pages. It is used by memory
> > compaction to build a list of destination pages. If
> > CONFIG_DEBUG_PAGEALLOC is set, a kernel paging request bug is triggered
> > because split_free_page() did not call the arch-allocation hooks or map
> > the page into the kernel address space.
> >
> > This patch does not update split_free_page() as it is called with
> > interrupts held. Instead it documents that callers of split_free_page()
> > are responsible for calling the arch hooks and to map the page and fixes
> > compaction.
>
> Dumb question. Why can't we call arch_alloc_page and kernel_map_pages
> as interrupt disabled?

In theory, it isn't known what arch_alloc_page is going to do but more
practically kernel_map_pages() is updating mappings and should be
flushing all the TLBs. It can't do that with interrupts disabled.

I checked X86 and it should be fine but only because it flushes the
local CPU and appears to just hope for the best that this doesn't cause
problems.

> It's deadlock issue or latency issue?

deadlock

> I don't found any comment about it.

I'm not aware of any. arch_alloc_page() is only used by s390 so it's not
well known. kernel_map_pages() is only active for a rarely used
debugging option.

> It should have added the comment around that functions. :)
>
> And now compaction only uses split_free_page and it is exposed by mm.h.
> I think it would be better to map pages inside split_free_page to
> export others.(ie, making generic function).

I considered that and it would not be ideal. It would have to disable and
reenable interrupts as each page is taken from the list or alternatively
require that the caller not have the zone lock taken. The latter of these
options is more reasonable but would still result in more interrupt enabling
and disabling.

split_free_page() is extremely specialised and requires knowledge of the
page allocator internals to call properly. There is little pressure to
make this easier to use at the cost of increased locking.

> If we can't do, how about making split_free_page static as static function?
> And only uses it in compaction.
>

It pretty much has to be in page_alloc.c because it uses internal
functions of the page allocator - e.g. rmv_page_order. I could move it
to mm/internal.h because whatever about split_page, I can't imagine why
anyone else would need to call split_free_page.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Tue, Apr 20, 2010 at 5:20 PM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> On Tue, Apr 20, 2010 at 11:39:46AM +0900, Minchan Kim wrote:
>> On Tue, Apr 20, 2010 at 4:39 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
>> > On Mon, Apr 19, 2010 at 07:14:42PM +0100, Mel Gorman wrote:
>> >> On Mon, Apr 19, 2010 at 07:01:33PM +0900, KAMEZAWA Hiroyuki wrote:
>> >> >
>> >> > mmotm 2010-04-15-14-42
>> >> >
>> >> > When I tried
>> >> >  # echo 0 > /proc/sys/vm/compaction
>> >> >
>> >> > I see following.
>> >> >
>> >> > My enviroment was
>> >> >   2.6.34-rc4-mm1+ (2010-04-15-14-42) (x86-64) CPUx8
>> >> >   allocating tons of hugepages and reduce free memory.
>> >> >
>> >> > What I did was:
>> >> >   # echo 0 > /proc/sys/vm/compact_memory
>> >> >
>> >> > Hmm, I see this kind of error at migation for the 1st time..
>> >> > my.config is attached. Hmm... ?
>> >> >
>> >> > (I'm sorry I'll be offline soon.)
>> >>
>> >> That's ok, thanks you for the report. I'm afraid I made little progress
>> >> as I spent most of the day on other bugs but I do have something for
>> >> you.
>> >>
>> >> First, I reproduced the problem using your .config. However, the problem does
>> >> not manifest with the .config I normally use which is derived from the distro
>> >> kernel configuration (Debian Lenny). So, there is something in your .config
>> >> that triggers the problem. I very strongly suspect this is an interaction
>> >> between migration, compaction and page allocation debug.
>> >
>> > I unexpecedly had the time to dig into this. Does the following patch fix
>> > your problem? It Worked For Me.
>>
>> Nice catch during shot time. Below is comment.
>>
>> >
>> > ==== CUT HERE ====
>> > mm,compaction: Map free pages in the address space after they get split for compaction
>> >
>> > split_free_page() is a helper function which takes a free page from the
>> > buddy lists and splits it into order-0 pages. It is used by memory
>> > compaction to build a list of destination pages. If
>> > CONFIG_DEBUG_PAGEALLOC is set, a kernel paging request bug is triggered
>> > because split_free_page() did not call the arch-allocation hooks or map
>> > the page into the kernel address space.
>> >
>> > This patch does not update split_free_page() as it is called with
>> > interrupts held. Instead it documents that callers of split_free_page()
>> > are responsible for calling the arch hooks and to map the page and fixes
>> > compaction.
>>
>> Dumb question. Why can't we call arch_alloc_page and kernel_map_pages
>> as interrupt disabled?
>
> In theory, it isn't known what arch_alloc_page is going to do but more
> practically kernel_map_pages() is updating mappings and should be
> flushing all the TLBs. It can't do that with interrupts disabled.
>
> I checked X86 and it should be fine but only because it flushes the
> local CPU and appears to just hope for the best that this doesn't cause
> problems.

Okay.

>> And now compaction only uses split_free_page and it is exposed by mm.h.
>> I think it would be better to map pages inside split_free_page to
>> export others.(ie, making generic function).
>
> I considered that and it would not be ideal. It would have to disable and
> reenable interrupts as each page is taken from the list or alternatively
> require that the caller not have the zone lock taken. The latter of these
> options is more reasonable but would still result in more interrupt enabling
> and disabling.
>
> split_free_page() is extremely specialised and requires knowledge of the
> page allocator internals to call properly. There is little pressure to
> make this easier to use at the cost of increased locking.
>
>> If we can't do, how about making split_free_page static as static function?
>> And only uses it in compaction.
>>
>
> It pretty much has to be in page_alloc.c because it uses internal
> functions of the page allocator - e.g. rmv_page_order. I could move it
> to mm/internal.h because whatever about split_page, I can't imagine why
> anyone else would need to call split_free_page.

Yes. Then, Let's add comment like split_page. :)
/*
* Note: this is probably too low level an operation for use in drivers.
* Please consult with lkml before using this in your driver.
*/


>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Apr 20, 2010 at 05:32:13PM +0900, Minchan Kim wrote:
> On Tue, Apr 20, 2010 at 5:20 PM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> > On Tue, Apr 20, 2010 at 11:39:46AM +0900, Minchan Kim wrote:
> >> On Tue, Apr 20, 2010 at 4:39 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> >> > On Mon, Apr 19, 2010 at 07:14:42PM +0100, Mel Gorman wrote:
> >> >> On Mon, Apr 19, 2010 at 07:01:33PM +0900, KAMEZAWA Hiroyuki wrote:
> >> >> >
> >> >> > mmotm 2010-04-15-14-42
> >> >> >
> >> >> > When I tried
> >> >> > �# echo 0 > /proc/sys/vm/compaction
> >> >> >
> >> >> > I see following.
> >> >> >
> >> >> > My enviroment was
> >> >> > � 2.6.34-rc4-mm1+ (2010-04-15-14-42) (x86-64) CPUx8
> >> >> > � allocating tons of hugepages and reduce free memory.
> >> >> >
> >> >> > What I did was:
> >> >> > � # echo 0 > /proc/sys/vm/compact_memory
> >> >> >
> >> >> > Hmm, I see this kind of error at migation for the 1st time..
> >> >> > my.config is attached. Hmm... ?
> >> >> >
> >> >> > (I'm sorry I'll be offline soon.)
> >> >>
> >> >> That's ok, thanks you for the report. I'm afraid I made little progress
> >> >> as I spent most of the day on other bugs but I do have something for
> >> >> you.
> >> >>
> >> >> First, I reproduced the problem using your .config. However, the problem does
> >> >> not manifest with the .config I normally use which is derived from the distro
> >> >> kernel configuration (Debian Lenny). So, there is something in your .config
> >> >> that triggers the problem. I very strongly suspect this is an interaction
> >> >> between migration, compaction and page allocation debug.
> >> >
> >> > I unexpecedly had the time to dig into this. Does the following patch fix
> >> > your problem? It Worked For Me.
> >>
> >> Nice catch during shot time. Below is comment.
> >>
> >> >
> >> > ==== CUT HERE ====
> >> > mm,compaction: Map free pages in the address space after they get split for compaction
> >> >
> >> > split_free_page() is a helper function which takes a free page from the
> >> > buddy lists and splits it into order-0 pages. It is used by memory
> >> > compaction to build a list of destination pages. If
> >> > CONFIG_DEBUG_PAGEALLOC is set, a kernel paging request bug is triggered
> >> > because split_free_page() did not call the arch-allocation hooks or map
> >> > the page into the kernel address space.
> >> >
> >> > This patch does not update split_free_page() as it is called with
> >> > interrupts held. Instead it documents that callers of split_free_page()
> >> > are responsible for calling the arch hooks and to map the page and fixes
> >> > compaction.
> >>
> >> Dumb question. Why can't we call arch_alloc_page and kernel_map_pages
> >> as interrupt disabled?
> >
> > In theory, it isn't known what arch_alloc_page is going to do but more
> > practically kernel_map_pages() is updating mappings and should be
> > flushing all the TLBs. It can't do that with interrupts disabled.
> >
> > I checked X86 and it should be fine but only because it flushes the
> > local CPU and appears to just hope for the best that this doesn't cause
> > problems.
>
> Okay.
>
> >> And now compaction only uses split_free_page and it is exposed by mm.h.
> >> I think it would be better to map pages inside split_free_page to
> >> export others.(ie, making generic function).
> >
> > I considered that and it would not be ideal. It would have to disable and
> > reenable interrupts as each page is taken from the list or alternatively
> > require that the caller not have the zone lock taken. The latter of these
> > options is more reasonable but would still result in more interrupt enabling
> > and disabling.
> >
> > split_free_page() is extremely specialised and requires knowledge of the
> > page allocator internals to call properly. There is little pressure to
> > make this easier to use at the cost of increased locking.
> >
> >> If we can't do, how about making split_free_page static as static function?
> >> And only uses it in compaction.
> >>
> >
> > It pretty much has to be in page_alloc.c because it uses internal
> > functions of the page allocator - e.g. rmv_page_order. I could move it
> > to mm/internal.h because whatever about split_page, I can't imagine why
> > anyone else would need to call split_free_page.
>
> Yes. Then, Let's add comment like split_page. :)
> /*
> * Note: this is probably too low level an operation for use in drivers.
> * Please consult with lkml before using this in your driver.
> */
>

I can, but the comment that was there says it's like split_page except the
page is already free. This also covers not using it in a driver.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Tue, Apr 20, 2010 at 06:50:23PM +0900, Minchan Kim wrote:
> On Tue, Apr 20, 2010 at 5:44 PM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> > On Tue, Apr 20, 2010 at 05:32:13PM +0900, Minchan Kim wrote:
> >>
> >> Yes. Then, Let's add comment like split_page. :)
> >> �/*
> >> �* Note: this is probably too low level an operation for use in drivers.
> >> �* Please consult with lkml before using this in your driver.
> >> �*/
> >>
> >
> > I can, but the comment that was there says it's like split_page except the
> > page is already free. This also covers not using it in a driver.
>
> I see. In addition, you already mentioned "As this is only being used
> for migration".
> I missed one.
> I don't have any against one.
> Will you repost v2 which move split_free_pages out of compaction.c?

I don't understand your suggestion. split_free_pages is already out of
compaction.c.

> Anyway, feel free to add my reviewed-by sign.
> Thanks, Mel.
>
> Reviewed-by: Minchan Kim <minchan.kim(a)gmail.com>
>

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/