From: Rik van Riel on
On 02/12/2010 07:00 AM, Mel Gorman wrote:
> Ordinarily when a high-order allocation fails, direct reclaim is entered to
> free pages to satisfy the allocation. With this patch, it is determined if
> an allocation failed due to external fragmentation instead of low memory
> and if so, the calling process will compact until a suitable page is
> freed. Compaction by moving pages in memory is considerably cheaper than
> paging out to disk and works where there are locked pages or no swap. If
> compaction fails to free a page of a suitable size, then reclaim will
> still occur.
>
> Direct compaction returns as soon as possible. As each block is compacted,
> it is checked if a suitable page has been freed and if so, it returns.
>
> Signed-off-by: Mel Gorman<mel(a)csn.ul.ie>

Acked-by: Rik van Riel <riel(a)redhat.com>

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Fri, Feb 19, 2010 at 3:02 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> Ordinarily when a high-order allocation fails, direct reclaim is entered to
> free pages to satisfy the allocation.  With this patch, it is determined if
> an allocation failed due to external fragmentation instead of low memory
> and if so, the calling process will compact until a suitable page is
> freed. Compaction by moving pages in memory is considerably cheaper than
> paging out to disk and works where there are locked pages or no swap. If
> compaction fails to free a page of a suitable size, then reclaim will
> still occur.
>
> Direct compaction returns as soon as possible. As each block is compacted,
> it is checked if a suitable page has been freed and if so, it returns.
>
> Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> Acked-by: Rik van Riel <riel(a)redhat.com>
> ---
>  include/linux/compaction.h |   16 +++++-
>  include/linux/vmstat.h     |    1 +
>  mm/compaction.c            |  118 ++++++++++++++++++++++++++++++++++++++++++++
>  mm/page_alloc.c            |   26 ++++++++++
>  mm/vmstat.c                |   15 +++++-
>  5 files changed, 172 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 6a2eefd..1cf95e2 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,13 +1,25 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>
> -/* Return values for compact_zone() */
> +/* Return values for compact_zone() and try_to_compact_pages() */
>  #define COMPACT_INCOMPLETE     0
> -#define COMPACT_COMPLETE       1
> +#define COMPACT_PARTIAL                1
> +#define COMPACT_COMPLETE       2
>
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>                        void __user *buffer, size_t *length, loff_t *ppos);
> +
> +extern int fragmentation_index(struct zone *zone, unsigned int order);
> +extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
> +                       int order, gfp_t gfp_mask, nodemask_t *mask);
> +#else
> +static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
> +                       int order, gfp_t gfp_mask, nodemask_t *nodemask)
> +{
> +       return COMPACT_INCOMPLETE;
> +}
> +
>  #endif /* CONFIG_COMPACTION */
>
>  #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index d7f7236..0ea7a38 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>                KSWAPD_SKIP_CONGESTION_WAIT,
>                PAGEOUTRUN, ALLOCSTALL, PGROTATED,
>                COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
> +               COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
>  #ifdef CONFIG_HUGETLB_PAGE
>                HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
>  #endif
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 02579c2..c7c73bb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -34,6 +34,8 @@ struct compact_control {
>        unsigned long nr_anon;
>        unsigned long nr_file;
>
> +       unsigned int order;             /* order a direct compactor needs */
> +       int migratetype;                /* MOVABLE, RECLAIMABLE etc */
>        struct zone *zone;
>  };
>
> @@ -298,10 +300,31 @@ static void update_nr_listpages(struct compact_control *cc)
>  static inline int compact_finished(struct zone *zone,
>                                                struct compact_control *cc)
>  {
> +       unsigned int order;
> +       unsigned long watermark = low_wmark_pages(zone) + (1 << cc->order);
> +
>        /* Compaction run completes if the migrate and free scanner meet */
>        if (cc->free_pfn <= cc->migrate_pfn)
>                return COMPACT_COMPLETE;
>
> +       /* Compaction run is not finished if the watermark is not met */
> +       if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0))
> +               return COMPACT_INCOMPLETE;
> +
> +       if (cc->order == -1)
> +               return COMPACT_INCOMPLETE;

Where do we set cc->order = -1?
Sorry but I can't find it.


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mel Gorman on
On Fri, Feb 19, 2010 at 11:41:56AM +0900, Minchan Kim wrote:
> On Fri, Feb 19, 2010 at 3:02 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> > Ordinarily when a high-order allocation fails, direct reclaim is entered to
> > free pages to satisfy the allocation. �With this patch, it is determined if
> > an allocation failed due to external fragmentation instead of low memory
> > and if so, the calling process will compact until a suitable page is
> > freed. Compaction by moving pages in memory is considerably cheaper than
> > paging out to disk and works where there are locked pages or no swap. If
> > compaction fails to free a page of a suitable size, then reclaim will
> > still occur.
> >
> > Direct compaction returns as soon as possible. As each block is compacted,
> > it is checked if a suitable page has been freed and if so, it returns.
> >
> > Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> > Acked-by: Rik van Riel <riel(a)redhat.com>
> > ---
> > �include/linux/compaction.h | � 16 +++++-
> > �include/linux/vmstat.h � � | � �1 +
> > �mm/compaction.c � � � � � �| �118 ++++++++++++++++++++++++++++++++++++++++++++
> > �mm/page_alloc.c � � � � � �| � 26 ++++++++++
> > �mm/vmstat.c � � � � � � � �| � 15 +++++-
> > �5 files changed, 172 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index 6a2eefd..1cf95e2 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -1,13 +1,25 @@
> > �#ifndef _LINUX_COMPACTION_H
> > �#define _LINUX_COMPACTION_H
> >
> > -/* Return values for compact_zone() */
> > +/* Return values for compact_zone() and try_to_compact_pages() */
> > �#define COMPACT_INCOMPLETE � � 0
> > -#define COMPACT_COMPLETE � � � 1
> > +#define COMPACT_PARTIAL � � � � � � � �1
> > +#define COMPACT_COMPLETE � � � 2
> >
> > �#ifdef CONFIG_COMPACTION
> > �extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> > � � � � � � � � � � � �void __user *buffer, size_t *length, loff_t *ppos);
> > +
> > +extern int fragmentation_index(struct zone *zone, unsigned int order);
> > +extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
> > + � � � � � � � � � � � int order, gfp_t gfp_mask, nodemask_t *mask);
> > +#else
> > +static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
> > + � � � � � � � � � � � int order, gfp_t gfp_mask, nodemask_t *nodemask)
> > +{
> > + � � � return COMPACT_INCOMPLETE;
> > +}
> > +
> > �#endif /* CONFIG_COMPACTION */
> >
> > �#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > index d7f7236..0ea7a38 100644
> > --- a/include/linux/vmstat.h
> > +++ b/include/linux/vmstat.h
> > @@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> > � � � � � � � �KSWAPD_SKIP_CONGESTION_WAIT,
> > � � � � � � � �PAGEOUTRUN, ALLOCSTALL, PGROTATED,
> > � � � � � � � �COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
> > + � � � � � � � COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
> > �#ifdef CONFIG_HUGETLB_PAGE
> > � � � � � � � �HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
> > �#endif
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 02579c2..c7c73bb 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -34,6 +34,8 @@ struct compact_control {
> > � � � �unsigned long nr_anon;
> > � � � �unsigned long nr_file;
> >
> > + � � � unsigned int order; � � � � � � /* order a direct compactor needs */
> > + � � � int migratetype; � � � � � � � �/* MOVABLE, RECLAIMABLE etc */
> > � � � �struct zone *zone;
> > �};
> >
> > @@ -298,10 +300,31 @@ static void update_nr_listpages(struct compact_control *cc)
> > �static inline int compact_finished(struct zone *zone,
> > � � � � � � � � � � � � � � � � � � � � � � � �struct compact_control *cc)
> > �{
> > + � � � unsigned int order;
> > + � � � unsigned long watermark = low_wmark_pages(zone) + (1 << cc->order);
> > +
> > � � � �/* Compaction run completes if the migrate and free scanner meet */
> > � � � �if (cc->free_pfn <= cc->migrate_pfn)
> > � � � � � � � �return COMPACT_COMPLETE;
> >
> > + � � � /* Compaction run is not finished if the watermark is not met */
> > + � � � if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0))
> > + � � � � � � � return COMPACT_INCOMPLETE;
> > +
> > + � � � if (cc->order == -1)
> > + � � � � � � � return COMPACT_INCOMPLETE;
>
> Where do we set cc->order = -1?
> Sorry but I can't find it.
>

Good spot, it should have been set in compact_node() to force a full
compaction.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Minchan Kim on
On Sat, Mar 13, 2010 at 1:41 AM, Mel Gorman <mel(a)csn.ul.ie> wrote:
> Ordinarily when a high-order allocation fails, direct reclaim is entered to
> free pages to satisfy the allocation.  With this patch, it is determined if
> an allocation failed due to external fragmentation instead of low memory
> and if so, the calling process will compact until a suitable page is
> freed. Compaction by moving pages in memory is considerably cheaper than
> paging out to disk and works where there are locked pages or no swap. If
> compaction fails to free a page of a suitable size, then reclaim will
> still occur.
>
> Direct compaction returns as soon as possible. As each block is compacted,
> it is checked if a suitable page has been freed and if so, it returns.
>
> Signed-off-by: Mel Gorman <mel(a)csn.ul.ie>
> Acked-by: Rik van Riel <riel(a)redhat.com>
Reviewed-by: Minchan Kim <minchan.kim(a)gmail.com>

At least, I can't find any fault more. :)

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> @@ -1765,6 +1766,31 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>
> cond_resched();
>
> + /* Try memory compaction for high-order allocations before reclaim */
> + if (order) {
> + *did_some_progress = try_to_compact_pages(zonelist,
> + order, gfp_mask, nodemask);
> + if (*did_some_progress != COMPACT_INCOMPLETE) {
> + page = get_page_from_freelist(gfp_mask, nodemask,
> + order, zonelist, high_zoneidx,
> + alloc_flags, preferred_zone,
> + migratetype);
> + if (page) {
> + __count_vm_event(COMPACTSUCCESS);
> + return page;
> + }
> +
> + /*
> + * It's bad if compaction run occurs and fails.
> + * The most likely reason is that pages exist,
> + * but not enough to satisfy watermarks.
> + */
> + count_vm_event(COMPACTFAIL);
> +
> + cond_resched();
> + }
> + }
> +

Hmm..Hmmm...........

Today, I've reviewed this patch and [11/11] carefully twice. but It is harder to ack.

This patch seems to assume page compaction is faster than direct
reclaim. but it often doesn't, because dropping useless page cache is very
lightweight operation, but page compaction makes a lot of memcpy (i.e. cpu cache
pollution). IOW this patch is focusing to hugepage allocation very aggressively, but
it seems not enough care to reduce typical workload damage.


At first, I would like to clarify current reclaim corner case and how vmscan should do at this mail.

Now we have Lumpy reclaim. It is very excellent solution for externa fragmentation.
but unfortunately it have lots corner case.

Viewpoint 1. Unnecessary IO

isolate_pages() for lumpy reclaim frequently grab very young page. it is often
still dirty. then, pageout() is called much.

Unfortunately, page size grained io is _very_ inefficient. it can makes lots disk
seek and kill disk io bandwidth.


Viewpoint 2. Unevictable pages

isolate_pages() for lumpy reclaim can pick up unevictable page. it is obviously
undroppable. so if the zone have plenty mlocked pages (it is not rare case on
server use case), lumpy reclaim can become very useless.


Viewpoint 3. GFP_ATOMIC allocation failure

Obviously lumpy reclaim can't help GFP_ATOMIC issue.


Viewpoint 4. reclaim latency

reclaim latency directly affect page allocation latency. so if lumpy reclaim with
much pageout io is slow (often it is), it affect page allocation latency and can
reduce end user experience.


I really hope that auto page migration help to solve above issue. but sadly this
patch seems doesn't.

Honestly, I think this patch was very impressive and useful at 2-3 years ago.
because 1) we didn't have lumpy reclaim 2) we didn't have sane reclaim bail out.
then, old vmscan is very heavyweight and inefficient operation for high order reclaim.
therefore the downside of adding this page migration is hidden relatively. but...

We have to make an effort to reduce reclaim latency, not adding new latency source.
Instead, I would recommend tightly integrate page-compaction and lumpy reclaim.
I mean 1) reusing lumpy reclaim's neighbor pfn page pickking up logic 2) do page
migration instead pageout when the page is some condition (example active or dirty
or referenced or swapbacked).

This patch seems shoot me! /me die. R.I.P. ;-)


btw please don't use 'hugeadm --set-recommended-min_free_kbytes' at testing.
To evaluate a case of free memory starvation is very important for this patch
series, I think. I slightly doubt this patch might invoke useless compaction
in such case.



At bottom line, the explict compaction via /proc can be merged soon, I think.
but this auto compaction logic seems need more discussion.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/