From: Nick Piggin on
It's not wrong per se, but the entire powerpc memory management code
does the IRQ disabling for its pagetable RCU code. So I think it would be
better to do the whole thing in one go.

I don't think Paul will surprise-break powerpc :)

It's up to Ben really, though.


On Thu, Apr 08, 2010 at 09:17:38PM +0200, Peter Zijlstra wrote:
> The powerpc page table freeing relies on the fact that IRQs hold off
> an RCU grace period, this is currently true for all existing RCU
> implementations but is not an assumption Paul wants to support.
>
> Therefore, also take the RCU read lock along with disabling IRQs to
> ensure the RCU grace period does at least cover these lookups.
>
> Requested-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> Cc: Nick Piggin <npiggin(a)suse.de>
> Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> ---
> arch/powerpc/mm/gup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-2.6/arch/powerpc/mm/gup.c
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/mm/gup.c
> +++ linux-2.6/arch/powerpc/mm/gup.c
> @@ -142,6 +142,7 @@ int get_user_pages_fast(unsigned long st
> * So long as we atomically load page table pointers versus teardown,
> * we can follow the address down to the the page and take a ref on it.
> */
> + rcu_read_lock();
> local_irq_disable();
>
> pgdp = pgd_offset(mm, addr);
> @@ -162,6 +163,7 @@ int get_user_pages_fast(unsigned long st
> } while (pgdp++, addr = next, addr != end);
>
> local_irq_enable();
> + rcu_read_unlock();
>
> VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
> return nr;
> @@ -171,6 +173,7 @@ int get_user_pages_fast(unsigned long st
>
> slow:
> local_irq_enable();
> + rcu_read_unlock();
> slow_irqon:
> pr_devel(" slow path ! nr = %d\n", nr);
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on
On Thu, 2010-04-08 at 21:17 +0200, Peter Zijlstra wrote:
> plain text document attachment (powerpc-gup_fast-rcu.patch)
> The powerpc page table freeing relies on the fact that IRQs hold off
> an RCU grace period, this is currently true for all existing RCU
> implementations but is not an assumption Paul wants to support.
>
> Therefore, also take the RCU read lock along with disabling IRQs to
> ensure the RCU grace period does at least cover these lookups.

There's a few other places that need a similar fix then. The hash page
code for example. All the C cases should end up calling the
find_linux_pte() helper afaik, so we should be able to stick the lock in
there (and the hugetlbfs variant, find_linux_pte_or_hugepte()).

However, we also have cases of tight asm code walking the page tables,
such as the tlb miss handler on embedded processors. I don't see how I
could do that there. IE. I only have a handful of registers to play
with, no stack, etc...

So we might have to support the interrupt assumption, at least in some
form, with those guys...

Cheers,
Ben.

> Requested-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> Cc: Nick Piggin <npiggin(a)suse.de>
> Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> ---
> arch/powerpc/mm/gup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-2.6/arch/powerpc/mm/gup.c
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/mm/gup.c
> +++ linux-2.6/arch/powerpc/mm/gup.c
> @@ -142,6 +142,7 @@ int get_user_pages_fast(unsigned long st
> * So long as we atomically load page table pointers versus teardown,
> * we can follow the address down to the the page and take a ref on it.
> */
> + rcu_read_lock();
> local_irq_disable();
>
> pgdp = pgd_offset(mm, addr);
> @@ -162,6 +163,7 @@ int get_user_pages_fast(unsigned long st
> } while (pgdp++, addr = next, addr != end);
>
> local_irq_enable();
> + rcu_read_unlock();
>
> VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
> return nr;
> @@ -171,6 +173,7 @@ int get_user_pages_fast(unsigned long st
>
> slow:
> local_irq_enable();
> + rcu_read_unlock();
> slow_irqon:
> pr_devel(" slow path ! nr = %d\n", nr);
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Tue, Apr 13, 2010 at 11:05:31AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-04-08 at 21:17 +0200, Peter Zijlstra wrote:
> > plain text document attachment (powerpc-gup_fast-rcu.patch)
> > The powerpc page table freeing relies on the fact that IRQs hold off
> > an RCU grace period, this is currently true for all existing RCU
> > implementations but is not an assumption Paul wants to support.
> >
> > Therefore, also take the RCU read lock along with disabling IRQs to
> > ensure the RCU grace period does at least cover these lookups.
>
> There's a few other places that need a similar fix then. The hash page
> code for example. All the C cases should end up calling the
> find_linux_pte() helper afaik, so we should be able to stick the lock in
> there (and the hugetlbfs variant, find_linux_pte_or_hugepte()).
>
> However, we also have cases of tight asm code walking the page tables,
> such as the tlb miss handler on embedded processors. I don't see how I
> could do that there. IE. I only have a handful of registers to play
> with, no stack, etc...
>
> So we might have to support the interrupt assumption, at least in some
> form, with those guys...

One way to make the interrupt assumption official is to use
synchronize_sched() rather than synchronize_rcu().

Thanx, Paul

> Cheers,
> Ben.
>
> > Requested-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra(a)chello.nl>
> > Cc: Nick Piggin <npiggin(a)suse.de>
> > Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
> > ---
> > arch/powerpc/mm/gup.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > Index: linux-2.6/arch/powerpc/mm/gup.c
> > ===================================================================
> > --- linux-2.6.orig/arch/powerpc/mm/gup.c
> > +++ linux-2.6/arch/powerpc/mm/gup.c
> > @@ -142,6 +142,7 @@ int get_user_pages_fast(unsigned long st
> > * So long as we atomically load page table pointers versus teardown,
> > * we can follow the address down to the the page and take a ref on it.
> > */
> > + rcu_read_lock();
> > local_irq_disable();
> >
> > pgdp = pgd_offset(mm, addr);
> > @@ -162,6 +163,7 @@ int get_user_pages_fast(unsigned long st
> > } while (pgdp++, addr = next, addr != end);
> >
> > local_irq_enable();
> > + rcu_read_unlock();
> >
> > VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
> > return nr;
> > @@ -171,6 +173,7 @@ int get_user_pages_fast(unsigned long st
> >
> > slow:
> > local_irq_enable();
> > + rcu_read_unlock();
> > slow_irqon:
> > pr_devel(" slow path ! nr = %d\n", nr);
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Mon, 2010-04-12 at 20:43 -0700, Paul E. McKenney wrote:
> > So we might have to support the interrupt assumption, at least in some
> > form, with those guys...
>
> One way to make the interrupt assumption official is to use
> synchronize_sched() rather than synchronize_rcu().

Well, call_rcu_sched() then, because the current usage is to use
call_rcu() to free the page directories.

Paul, here is a call_rcu_sched() available in kernel/rcutree.c, but am I
right in reading that code that that would not be available for
preemptible RCU?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Paul E. McKenney on
On Wed, Apr 14, 2010 at 03:51:50PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-04-12 at 20:43 -0700, Paul E. McKenney wrote:
> > > So we might have to support the interrupt assumption, at least in some
> > > form, with those guys...
> >
> > One way to make the interrupt assumption official is to use
> > synchronize_sched() rather than synchronize_rcu().
>
> Well, call_rcu_sched() then, because the current usage is to use
> call_rcu() to free the page directories.
>
> Paul, here is a call_rcu_sched() available in kernel/rcutree.c, but am I
> right in reading that code that that would not be available for
> preemptible RCU?

Both call_rcu_sched() and call_rcu() are always there for you. ;-)

o If CONFIG_TREE_RCU (or CONFIG_TINY_RCU), they both have the same
implementation.

o If CONFIG_TREE_PREEMPT_RCU, call_rcu_sched() is preemptible and
call_rcu() is not.

Of course, with call_rcu_sched(), the corresponding RCU read-side critical
sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these
read-side critical sections must use raw spinlocks.

Can the code in question accommodate these restrictions?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/