From: Nick Piggin on
On Fri, Apr 09, 2010 at 10:35:31AM +0200, Peter Zijlstra wrote:
> On Fri, 2010-04-09 at 14:14 +1000, Nick Piggin wrote:
> > On Thu, Apr 08, 2010 at 09:17:37PM +0200, Peter Zijlstra wrote:
> > > Hi,
> > >
> > > This (still incomplete) patch-set makes part of the mm a lot more preemptible.
> > > It converts i_mmap_lock and anon_vma->lock to mutexes. On the way there it
> > > also makes mmu_gather preemptible.
> > >
> > > The main motivation was making mm_take_all_locks() preemptible, since it
> > > appears people are nesting hundreds of spinlocks there.
> > >
> > > The side-effects are that we can finally make mmu_gather preemptible, something
> > > which lots of people have wanted to do for a long time.
> >
> > What's the straight-line performance impact of all this? And how about
> > concurrency, I wonder. mutexes of course are double the atomics, and
> > you've added a refcount which is two more again for those paths using
> > it.
> >
> > Page faults are very important. We unfortunately have some databases
> > doing a significant amount of mmap/munmap activity too.
>
> You think this would affect the mmap/munmap times in any significant
> way? It seems to me those are relatively heavy ops to begin with.

They're actually not _too_ heavy because they just setup and tear
down vmas. No flushing or faulting required (well, in order to make
any _use_ of them you need flushing and faulting of course).

I have some microbenchmarks like this and the page fault test I
could try.


> > I'd like to
> > see microbenchmark numbers for each of those (both anon and file backed
> > for page faults).
>
> OK, I'll dig out that fault test used in the whole mmap_sem/rwsem thread
> a while back and modify it to also do file backed faults.

That'd be good. Anonymous as well of course, for non-databases :)


> > kbuild does quite a few pages faults, that would be an easy thing to
> > test. Not sure what reasonable kinds of cases exercise parallelism.
> >
> >
> > > What kind of performance tests would people have me run on this to satisfy
> > > their need for numbers? I've done a kernel build on x86_64 and if anything that
> > > was slightly faster with these patches, but it was well within the noise
> > > levels so it might be heat noise I'm looking at ;-)
> >
> > Is it because you're reducing the number of TLB flushes, or what
> > (kbuild isn't multi threaded so on x86 TLB flushes should be really
> > fast anyway).
>
> I'll try and get some perf stat runs to get some insight into this. But
> the numbers were:
>
> time make O=defconfig -j48 bzImage (5x, cache hot)
>
> without: avg: 39.2018s +- 0.3407
> with: avg: 38.9886s +- 0.1814

Well that's interesting. Nice if it is an improvement not just some
anomoly. I'd start by looking at TLB flushes maybe? For testing, it
would be nice to make the flush sizes equal so you get more of a
comparison of the straight line code.

Other than this, I don't have a good suggestion of what to test. I mean,
how far can you go? :) Some threaded workloads would probably be a good
idea, though. Java?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin Schwidefsky on
Hi Peter,

On Thu, 08 Apr 2010 21:17:37 +0200
Peter Zijlstra <a.p.zijlstra(a)chello.nl> wrote:

> The side-effects are that we can finally make mmu_gather preemptible, something
> which lots of people have wanted to do for a long time.

Yes, that is a good thing. With the preemptible mmu_gather s390 will
use less IDTEs (that is the instruction that flushes all TLBs for a
given address space) on a full flush. Good :-)

> This patch-set seems to build and boot on my x86_64 machines and even builds a
> kernel. I've also attempted powerpc and sparc, which I've compile tested with
> their respective defconfigs, remaining are (afaikt the rest uses the generic
> tlb bits):
>
> - s390
> - ia64
> - arm
> - superh
> - um
>
> From those, s390 and ia64 look 'interesting', arm and superh seem very similar
> and should be relatively easy (-rt has a patchlet for arm iirc).

To get the 'interesting' TLB flushing on s390 working again you need
this patch:

--
[PATCH] s390: preemptible mmu_gather

From: Martin Schwidefsky <schwidefsky(a)de.ibm.com>

Adapt the stand-alone s390 mmu_gather implementation to the new
preemptible mmu_gather interface.

Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
---
arch/s390/include/asm/tlb.h | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)

--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -28,45 +28,51 @@
#include <asm/smp.h>
#include <asm/tlbflush.h>

-#ifndef CONFIG_SMP
-#define TLB_NR_PTRS 1
-#else
-#define TLB_NR_PTRS 508
-#endif
-
struct mmu_gather {
struct mm_struct *mm;
unsigned int fullmm;
unsigned int nr_ptes;
unsigned int nr_pxds;
- void *array[TLB_NR_PTRS];
+ unsigned int max;
+ void **array;
+ void *local[8];
};

-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-static inline struct mmu_gather *tlb_gather_mmu(struct mm_struct *mm,
- unsigned int full_mm_flush)
+static inline void __tlb_alloc_pages(struct mmu_gather *tlb)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
+ unsigned long addr = __get_free_pages(GFP_ATOMIC, 0);
+
+ if (addr) {
+ tlb->array = (void *) addr;
+ tlb->max = PAGE_SIZE / sizeof(void *);
+ }
+}

+static inline void tlb_gather_mmu(struct mmu_gather *tlb,
+ struct mm_struct *mm,
+ unsigned int full_mm_flush)
+{
tlb->mm = mm;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->array = tlb->local;
tlb->fullmm = full_mm_flush || (num_online_cpus() == 1) ||
(atomic_read(&mm->mm_users) <= 1 && mm == current->active_mm);
- tlb->nr_ptes = 0;
- tlb->nr_pxds = TLB_NR_PTRS;
if (tlb->fullmm)
__tlb_flush_mm(mm);
- return tlb;
+ else
+ __tlb_alloc_pages(tlb);
+ tlb->nr_ptes = 0;
+ tlb->nr_pxds = tlb->max;
}

static inline void tlb_flush_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
- if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < TLB_NR_PTRS))
+ if (!tlb->fullmm && (tlb->nr_ptes > 0 || tlb->nr_pxds < tlb->max))
__tlb_flush_mm(tlb->mm);
while (tlb->nr_ptes > 0)
pte_free(tlb->mm, tlb->array[--tlb->nr_ptes]);
- while (tlb->nr_pxds < TLB_NR_PTRS)
+ while (tlb->nr_pxds < tlb->max)
/* pgd_free frees the pointer as region or segment table */
pgd_free(tlb->mm, tlb->array[tlb->nr_pxds++]);
}
@@ -79,7 +85,8 @@ static inline void tlb_finish_mmu(struct
/* keep the page table cache within bounds */
check_pgt_cache();

- put_cpu_var(mmu_gathers);
+ if (tlb->array != tlb->local)
+ free_pages((unsigned long) tlb->array, 0);
}

/*

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-04-09 at 18:50 +1000, Nick Piggin wrote:
> I have some microbenchmarks like this and the page fault test I
> could try.
>
If you don't mind, please. If you send them my way I'll run them on my
machines too.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Howells on

Have you tried compiling this for NOMMU?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Fri, 2010-04-09 at 10:03 +0100, David Howells wrote:
> Have you tried compiling this for NOMMU?

No, I will eventually, I'm sure there's something to fix.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/