From: Rafael J. Wysocki on
On Sunday 01 November 2009, KOSAKI Motohiro wrote:
> shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
> memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
> was introduced by commit a06fe4d307 (Speed freeing memory for suspend).
>
> commit a06fe4d307 said
>
> Without the patch:
> Freed 14600 pages in 1749 jiffies = 32.61 MB/s (Anomolous!)
> Freed 88563 pages in 14719 jiffies = 23.50 MB/s
> Freed 205734 pages in 32389 jiffies = 24.81 MB/s
>
> With the patch:
> Freed 68252 pages in 496 jiffies = 537.52 MB/s
> Freed 116464 pages in 569 jiffies = 798.54 MB/s
> Freed 209699 pages in 705 jiffies = 1161.89 MB/s
>
> At that time, their patch was pretty worth. However, Modern Hardware
> trend and recent VM improvement broke its worth. From several reason,
> I think we should remove shrink_all_zones() at all.
>
> detail:
>
> 1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle
> at no i/o congestion.
> but current shrink_zone() is sane, not slow.
>
> 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
> fine on numa system.
> example)
> System has 4GB memory and each node have 2GB. and hibernate need 1GB.
>
> optimal)
> steal 500MB from each node.
> shrink_all_zones)
> steal 1GB from node-0.
>
> Oh, Cache balancing logic was broken. ;)
> Unfortunately, Desktop system moved ahead NUMA at nowadays.
> (Side note, if hibernate require 2GB, shrink_all_zones() never success
> on above machine)
>
> 3) if the node has several I/O flighting pages, shrink_all_zones() makes
> pretty bad result.
>
> schenario) hibernate need 1GB
>
> 1) shrink_all_zones() try to reclaim 1GB from Node-0
> 2) but it only reclaimed 990MB
> 3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
> 4) it reclaimed 990MB
>
> Oh, well. it reclaimed twice much than required.
> In the other hand, current shrink_zone() has sane baling out logic.
> then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.
>
> 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
> shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.
>
> Then, This patch changed shrink_all_memory() to only the wrapper function of
> do_try_to_free_pages(). it bring good reviewability and debuggability, and solve
> above problems.
>
> side note: Reclaim logic unificication makes two good side effect.
> - Fix recursive reclaim bug on shrink_all_memory().
> it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.

As I said previously, I don't really see a reason to keep shrink_all_memory().

Do you think that removing it will result in performance degradation?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
> > Then, This patch changed shrink_all_memory() to only the wrapper function of
> > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve
> > above problems.
> >
> > side note: Reclaim logic unificication makes two good side effect.
> > - Fix recursive reclaim bug on shrink_all_memory().
> > it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> > - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
>
> As I said previously, I don't really see a reason to keep shrink_all_memory().
>
> Do you think that removing it will result in performance degradation?

Hmm...
Probably, I misunderstood your mention. I thought you suggested to kill
all hibernation specific reclaim code. I did. It's no performance degression.
(At least, I didn't observe)

But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
it's impossible.

Current VM reclaim code need some preparetion to caller, and there are existing in
both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.

Please see following shrink_all_memory() code. it's pretty small. it only have
few vmscan preparation. I don't think it is hard to maintainance.


=====================================================
unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
{
struct reclaim_state reclaim_state;
struct scan_control sc = {
.gfp_mask = GFP_HIGHUSER_MOVABLE,
.may_swap = 1,
.may_unmap = 1,
.may_writepage = 1,
.nr_to_reclaim = nr_to_reclaim,
.hibernation_mode = 1,
.swappiness = vm_swappiness,
.order = 0,
.isolate_pages = isolate_pages_global,
};
struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
unsigned long nr_reclaimed;

p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;

nr_reclaimed = do_try_to_free_pages(zonelist, &sc);

p->reclaim_state = NULL;
lockdep_clear_current_reclaim_state();
p->flags &= ~PF_MEMALLOC;

return nr_reclaimed;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KOSAKI Motohiro on
Hi

Thank you for the reviewing :)

> > 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
> > fine on numa system.
> > example)
> > System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> >
> > optimal)
> > steal 500MB from each node.
> > shrink_all_zones)
> > steal 1GB from node-0.
>
> I haven't given much thought to numa awareness in hibernate code, but I
> can say that the shrink_all_memory interface is woefully inadequate as
> far as zone awareness goes. Since lowmem needs to be atomically restored
> before we can restore highmem, we really need to be able to ask for a
> particular number of pages of a particular zone type to be freed.

Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
per-zone number of freed pages information? if hibernation don't need highmem.
following incremental patch prevent highmem reclaim perfectly. Is it enough?


---
mm/vmscan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e6ea011..7fb3435 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2265,7 +2265,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
{
struct reclaim_state reclaim_state;
struct scan_control sc = {
- .gfp_mask = GFP_HIGHUSER_MOVABLE,
+ .gfp_mask = GFP_KERNEL,
.may_swap = 1,
.may_unmap = 1,
.may_writepage = 1,
--
1.6.2.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Monday 02 November 2009, KOSAKI Motohiro wrote:
> > > Then, This patch changed shrink_all_memory() to only the wrapper function of
> > > do_try_to_free_pages(). it bring good reviewability and debuggability, and solve
> > > above problems.
> > >
> > > side note: Reclaim logic unificication makes two good side effect.
> > > - Fix recursive reclaim bug on shrink_all_memory().
> > > it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock.
> > > - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability.
> >
> > As I said previously, I don't really see a reason to keep shrink_all_memory().
> >
> > Do you think that removing it will result in performance degradation?
>
> Hmm...
> Probably, I misunderstood your mention. I thought you suggested to kill
> all hibernation specific reclaim code. I did. It's no performance degression.
> (At least, I didn't observe)
>
> But, if you hope to kill shrink_all_memory() function itsef, the short answer is,
> it's impossible.
>
> Current VM reclaim code need some preparetion to caller, and there are existing in
> both alloc_pages_slowpath() and try_to_free_pages(). We can't omit its preparation.

Well, my grepping for 'shrink_all_memory' throughout the entire kernel source
code seems to indicate that hibernate_preallocate_memory() is the only current
user of it. I may be wrong, but I doubt it, unless some new users have been
added since 2.6.31.

In case I'm not wrong, it should be safe to drop it from
hibernate_preallocate_memory(), because it's there for performance reasons
only. Now, since hibernate_preallocate_memory() appears to be the only user of
it, it should be safe to drop it entirely.

> Please see following shrink_all_memory() code. it's pretty small. it only have
> few vmscan preparation. I don't think it is hard to maintainance.

No, it's not, but I'm really not sure it's worth keeping.

Thanks,
Rafael


> =====================================================
> unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> {
> struct reclaim_state reclaim_state;
> struct scan_control sc = {
> .gfp_mask = GFP_HIGHUSER_MOVABLE,
> .may_swap = 1,
> .may_unmap = 1,
> .may_writepage = 1,
> .nr_to_reclaim = nr_to_reclaim,
> .hibernation_mode = 1,
> .swappiness = vm_swappiness,
> .order = 0,
> .isolate_pages = isolate_pages_global,
> };
> struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> struct task_struct *p = current;
> unsigned long nr_reclaimed;
>
> p->flags |= PF_MEMALLOC;
> lockdep_set_current_reclaim_state(sc.gfp_mask);
> reclaim_state.reclaimed_slab = 0;
> p->reclaim_state = &reclaim_state;
>
> nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>
> p->reclaim_state = NULL;
> lockdep_clear_current_reclaim_state();
> p->flags &= ~PF_MEMALLOC;
>
> return nr_reclaimed;
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Monday 02 November 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> KOSAKI Motohiro wrote:
> >> I haven't given much thought to numa awareness in hibernate code, but I
> >> can say that the shrink_all_memory interface is woefully inadequate as
> >> far as zone awareness goes. Since lowmem needs to be atomically restored
> >> before we can restore highmem, we really need to be able to ask for a
> >> particular number of pages of a particular zone type to be freed.
> >
> > Honestly, I am not suspend/hibernation expert. Can I ask why caller need to know
> > per-zone number of freed pages information? if hibernation don't need highmem.
> > following incremental patch prevent highmem reclaim perfectly. Is it enough?
>
> (Disclaimer: I don't think about highmem a lot any more, and might have
> forgotten some of the details, or swsusp's algorithms might have
> changed. Rafael might need to correct some of this...)
>
> Imagine that you have a system with 1000 pages of lowmem and 5000 pages
> of highmem. Of these, 950 lowmem pages are in use and 500 highmem pages
> are in use.
>
> In order to to be able to save an image, we need to be able to do an
> atomic copy of those lowmem pages.
>
> You might think that we could just copy everything into the spare
> highmem pages, but we can't because mapping and unmapping the highmem
> pages as we copy the data will leave us with an inconsistent copy.

This isn't the case any more for the mainline hibernate code. We use highmem
for storing image data as well as lowmem.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/