From: Christoph Lameter on
On Fri, 28 Dec 2007, Dhaval Giani wrote:

> we managed to get your required information. Last 10,000 lines are
> attached (The uncompressed file comes to 500 kb).
>
> Hope it helps.

Somehow the nr_pages field is truncated to 16 bit and it
seems that there are sign issues there? We are wrapping around....

q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 46266, min_pages is 25 ----> bash
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 46265, min_pages is 25 ----> bash
q->nr_pages is 46265, min_pages is 25 ----> cat
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 46265, min_pages is 25 ----> cat
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 0, min_pages is 25 ----> swapper
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 36877, min_pages is 25 ----> swapper
q->nr_pages is 46265, min_pages is 25 ----> cat


An int is just a 16 bit field on i386? I thought it was 32 bits? Or is
the result due to the way that systemtap works?

Could you post the neighboring per cpu variables to quicklist (look at the
System.map). Maybe somehow we corrupt the nr_pages and page contents.

Also could you do another systemtap and also print out the current
processor? Maybe nr_pages gets only corrupted on a specific processor. I
see a zero there and sometimes other sane values.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
Just traced it again on my system: It is okay for the number of pages on
the quicklist to reach the high count that we see (although the 16 bit
limits are weird. You have around 4GB of memory in the system?). Up to
1/16th of free memory of a node can be allocated for quicklists (this
allows the effective shutting down and restarting of large amounts of
processes)

The problem may be that this is run on a HIGHMEM system and the
calculation of allowable pages on the quicklists does not take into
account that highmem pages are not usable for quicklists (not sure about
ZONE_MOVABLE on i386. Maybe we need to take that into account as well?)

Here is a patch that removes the HIGHMEM portion from the calculation.
Does this change anything:

Index: linux-2.6/mm/quicklist.c
===================================================================
--- linux-2.6.orig/mm/quicklist.c 2008-01-02 13:41:10.000000000 -0800
+++ linux-2.6/mm/quicklist.c 2008-01-02 13:44:15.000000000 -0800
@@ -29,6 +29,12 @@ static unsigned long max_pages(unsigned

node_free_pages = node_page_state(numa_node_id(),
NR_FREE_PAGES);
+#ifdef CONFIG_HIGHMEM
+ /* Take HIGHMEM pages out of consideration */
+ node_free_pages -= zone_page_state(&NODE_DATA(numa_node_id())->node_zones[ZONE_HIGHMEM],
+ NR_FREE_PAGES);
+#endif
+
max = node_free_pages / FRACTION_OF_NODE_MEM;
return max(max, min_pages);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dhaval Giani on
On Thu, Jan 03, 2008 at 09:29:42AM +0530, Dhaval Giani wrote:
> On Wed, Jan 02, 2008 at 01:54:12PM -0800, Christoph Lameter wrote:
> > Just traced it again on my system: It is okay for the number of pages on
> > the quicklist to reach the high count that we see (although the 16 bit
> > limits are weird. You have around 4GB of memory in the system?). Up to
> > 1/16th of free memory of a node can be allocated for quicklists (this
> > allows the effective shutting down and restarting of large amounts of
> > processes)
> >
> > The problem may be that this is run on a HIGHMEM system and the
> > calculation of allowable pages on the quicklists does not take into
> > account that highmem pages are not usable for quicklists (not sure about
> > ZONE_MOVABLE on i386. Maybe we need to take that into account as well?)
> >
> > Here is a patch that removes the HIGHMEM portion from the calculation.
> > Does this change anything:
> >
>
> Yep. This one hits it. I don't see the obvious signs of the oom
> happening in the 5 mins I have run the script. I will let it run for
> some more time.
>

Yes, no oom even after 20 mins of running (which is double the normal
time for the oom to occur), also no changes in free lowmem.

Thanks for the fix. Feel free to add a

Tested-by: Dhaval Giani <dhaval(a)linux.vnet.ibm.com>

--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dhaval Giani on
On Wed, Jan 02, 2008 at 01:54:12PM -0800, Christoph Lameter wrote:
> Just traced it again on my system: It is okay for the number of pages on
> the quicklist to reach the high count that we see (although the 16 bit
> limits are weird. You have around 4GB of memory in the system?). Up to
> 1/16th of free memory of a node can be allocated for quicklists (this
> allows the effective shutting down and restarting of large amounts of
> processes)
>
> The problem may be that this is run on a HIGHMEM system and the
> calculation of allowable pages on the quicklists does not take into
> account that highmem pages are not usable for quicklists (not sure about
> ZONE_MOVABLE on i386. Maybe we need to take that into account as well?)
>
> Here is a patch that removes the HIGHMEM portion from the calculation.
> Does this change anything:
>

Yep. This one hits it. I don't see the obvious signs of the oom
happening in the 5 mins I have run the script. I will let it run for
some more time.

Thanks!
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Thu, 3 Jan 2008, Dhaval Giani wrote:

> Yes, no oom even after 20 mins of running (which is double the normal
> time for the oom to occur), also no changes in free lowmem.

Ahhh.. Good then lets redo the patchset the right way (the patch so far
does not address the ZONE_MOVABLE issues) . Does this patch
also do the trick?



Quicklists: Only consider memory that can be allocated via GFP_KERNEL

Quicklists calculates the size of the quicklists based on the number
of free pages. This must be the number of free pages that can be
allocated with GFP_KERNEL. node_page_state() includes the pages in
ZONE_HIGHMEM and ZONE_MOVABLE. These should not be considered for the
size calculation.

Signed-off-by: Christoph Lameter <clameter(a)sgi.com>

Index: linux-2.6/mm/quicklist.c
===================================================================
--- linux-2.6.orig/mm/quicklist.c 2008-01-03 12:22:55.000000000 -0800
+++ linux-2.6/mm/quicklist.c 2008-01-03 13:00:30.000000000 -0800
@@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quickli
static unsigned long max_pages(unsigned long min_pages)
{
unsigned long node_free_pages, max;
+ struct zone *zones = NODE_DATA(node)->node_zones;
+
+ node_free_pages =
+#ifdef CONFIG_ZONE_DMA
+ zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
+#endif
+#ifdef CONFIG_ZONE_DMA32
+ zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
+#endif
+ zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);

- node_free_pages = node_page_state(numa_node_id(),
- NR_FREE_PAGES);
max = node_free_pages / FRACTION_OF_NODE_MEM;
return max(max, min_pages);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/