From: Pekka Enberg on
Pekka Enberg wrote:
> Christoph Lameter wrote:
>> I wonder if this is not related to the kmem_cache_cpu structure
>> straggling
>> cache line boundaries under some conditions. On 2.6.33 the kmem_cache_cpu
>> structure was larger and therefore tight packing resulted in different
>> alignment.
>>
>> Could you see how the following patch affects the results. It attempts to
>> increase the size of kmem_cache_cpu to a power of 2 bytes. There is also
>> the potential that other per cpu fetches to neighboring objects affect
>> the
>> situation. We could cacheline align the whole thing.
>>
>> ---
>> include/linux/slub_def.h | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> Index: linux-2.6/include/linux/slub_def.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/slub_def.h 2010-04-07
>> 11:33:50.000000000 -0500
>> +++ linux-2.6/include/linux/slub_def.h 2010-04-07
>> 11:35:18.000000000 -0500
>> @@ -38,6 +38,11 @@ struct kmem_cache_cpu {
>> void **freelist; /* Pointer to first free per cpu object */
>> struct page *page; /* The slab from which we are allocating */
>> int node; /* The node of the page (or -1 for debug) */
>> +#ifndef CONFIG_64BIT
>> + int dummy1;
>> +#endif
>> + unsigned long dummy2;
>> +
>> #ifdef CONFIG_SLUB_STATS
>> unsigned stat[NR_SLUB_STAT_ITEMS];
>> #endif
>
> Would __cacheline_aligned_in_smp do the trick here?

Oh, sorry, I think it's actually '____cacheline_aligned_in_smp' (with
four underscores) for per-cpu data. Confusing...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on
Christoph Lameter wrote:
> I wonder if this is not related to the kmem_cache_cpu structure straggling
> cache line boundaries under some conditions. On 2.6.33 the kmem_cache_cpu
> structure was larger and therefore tight packing resulted in different
> alignment.
>
> Could you see how the following patch affects the results. It attempts to
> increase the size of kmem_cache_cpu to a power of 2 bytes. There is also
> the potential that other per cpu fetches to neighboring objects affect the
> situation. We could cacheline align the whole thing.
>
> ---
> include/linux/slub_def.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> Index: linux-2.6/include/linux/slub_def.h
> ===================================================================
> --- linux-2.6.orig/include/linux/slub_def.h 2010-04-07 11:33:50.000000000 -0500
> +++ linux-2.6/include/linux/slub_def.h 2010-04-07 11:35:18.000000000 -0500
> @@ -38,6 +38,11 @@ struct kmem_cache_cpu {
> void **freelist; /* Pointer to first free per cpu object */
> struct page *page; /* The slab from which we are allocating */
> int node; /* The node of the page (or -1 for debug) */
> +#ifndef CONFIG_64BIT
> + int dummy1;
> +#endif
> + unsigned long dummy2;
> +
> #ifdef CONFIG_SLUB_STATS
> unsigned stat[NR_SLUB_STAT_ITEMS];
> #endif

Would __cacheline_aligned_in_smp do the trick here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Wed, 7 Apr 2010, Pekka Enberg wrote:

> Christoph Lameter wrote:
> > I wonder if this is not related to the kmem_cache_cpu structure straggling
> > cache line boundaries under some conditions. On 2.6.33 the kmem_cache_cpu
> > structure was larger and therefore tight packing resulted in different
> > alignment.
> >
> > Could you see how the following patch affects the results. It attempts to
> > increase the size of kmem_cache_cpu to a power of 2 bytes. There is also
> > the potential that other per cpu fetches to neighboring objects affect the
> > situation. We could cacheline align the whole thing.
> >
> > ---
> > include/linux/slub_def.h | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > Index: linux-2.6/include/linux/slub_def.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/slub_def.h 2010-04-07 11:33:50.000000000
> > -0500
> > +++ linux-2.6/include/linux/slub_def.h 2010-04-07 11:35:18.000000000
> > -0500
> > @@ -38,6 +38,11 @@ struct kmem_cache_cpu {
> > void **freelist; /* Pointer to first free per cpu object */
> > struct page *page; /* The slab from which we are allocating */
> > int node; /* The node of the page (or -1 for debug) */
> > +#ifndef CONFIG_64BIT
> > + int dummy1;
> > +#endif
> > + unsigned long dummy2;
> > +
> > #ifdef CONFIG_SLUB_STATS
> > unsigned stat[NR_SLUB_STAT_ITEMS];
> > #endif
>
> Would __cacheline_aligned_in_smp do the trick here?

This is allocated via the percpu allocator. We could specify cacheline
alignment there but that would reduce the density. You basically need 4
words for a kmem_cache_cpu structure. A number of those fit into one 64
byte cacheline.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on
Christoph Lameter wrote:
> On Wed, 7 Apr 2010, Pekka Enberg wrote:
>
>> Oh, sorry, I think it's actually '____cacheline_aligned_in_smp' (with four
>> underscores) for per-cpu data. Confusing...
>
> This does not particulary help to clarify the situation since we are
> dealing with data that can either be allocated via the percpu allocator or
> be statically present (kmalloc bootstrap situation).

Yes, I am an idiot. :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Wed, 7 Apr 2010, Pekka Enberg wrote:

> Oh, sorry, I think it's actually '____cacheline_aligned_in_smp' (with four
> underscores) for per-cpu data. Confusing...

This does not particulary help to clarify the situation since we are
dealing with data that can either be allocated via the percpu allocator or
be statically present (kmalloc bootstrap situation).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/