From: Kirill A. Shutemov on
On Fri, Jul 02, 2010 at 12:12:07AM +0100, Russell King wrote:
> On Fri, Jul 02, 2010 at 01:59:11AM +0300, Kirill A. Shutemov wrote:
> > On Thu, Jul 01, 2010 at 11:48:37PM +0100, Russell King wrote:
> > > One possibility is that for some reason you're using the legacy prefetch
> > > abort code or pre-IFSR code, which will always tell the kernel that its
> > > a translation fault - and in this case, this patch would improve the
> > > situation. What kernel version are you using?
> >
> > 2.6.32
>
> Should be recent enough.
>
> > If you run this test in loop on kernel without the patch you'll finally
> > get hung instead SIGSEGV.
> >
> > It seems the patch fixes more than it was written for. :)
>
> Have you investigated the IFSR and IFAR values, and the corresponding
> page table state? I'm not going to be able to run your test code for
> a few days.

I've investigated the issue. It's reproducible if you try to jump to
the megabyte next to section mapping.

On my OMAP3-based device I use address 0xfc1XXXXX for debugging propose.
It's a megabyte next to OMAP343X_SMS mapping.

On ARM one Linux PGD entry contains two hardware entry. But there is error
in do_translation_fault(). It's always call pmd_none() check for the first
entry of two, not for the entry corresponded to address. So in case if we
try to jump the megabyte next to section mapping, we will have inifinity
loop of translation faults.

Signed-off-by: Kirill A. Shutemov <kirill(a)shutemov.name>
---

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 9634fe1..825b9da 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -406,7 +406,8 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
pmd_k = pmd_offset(pgd_k, addr);
pmd = pmd_offset(pgd, addr);

- if (pmd_none(*pmd_k))
+ index = (addr >> SECTION_SHIFT) & 1;
+ if (pmd_none(pmd_k[index]))
goto bad_area;

copy_pmd(pmd, pmd_k);
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kirill A. Shutemov on
On Tue, Jul 06, 2010 at 11:58:15PM +0100, Russell King wrote:
> On Tue, Jul 06, 2010 at 04:06:18PM +0300, Kirill A. Shutemov wrote:
> > I've investigated the issue. It's reproducible if you try to jump to
> > the megabyte next to section mapping.
>
> Okay, this is specific to the way that OMAP sets up its mappings, which
> is why it doesn't appear everywhere.
>
> > On ARM one Linux PGD entry contains two hardware entry. But there is error
> > in do_translation_fault(). It's always call pmd_none() check for the first
> > entry of two, not for the entry corresponded to address. So in case if we
> > try to jump the megabyte next to section mapping, we will have inifinity
> > loop of translation faults.
>
> Okay, now that we know _why_ it happens, I'm satisfied that the fix
> previously committed will help this situation.
>
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 9634fe1..825b9da 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -406,7 +406,8 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
> > pmd_k = pmd_offset(pgd_k, addr);
> > pmd = pmd_offset(pgd, addr);
> >
> > - if (pmd_none(*pmd_k))
> > + index = (addr >> SECTION_SHIFT) & 1;
> > + if (pmd_none(pmd_k[index]))
>
> I do think this is extremely obscure, and therefore requires a comment
> to help people understand what is going on here and why. Leaving it
> in the commit log would be an invitation for this to be needlessly
> cut'n'pasted.

Ok, I'll fix it.

But it seems that the problem is more global. Potentially, any of
pmd_none() check may produce false results. I don't see an easy way to fix
it.

It's not so big problem since we don't have [super]section in userspace,
but I guess, we want to have huge pages support in the future.

Any ideas how to fix it in the right way?

Does Linux VM still expect one PTE table per page?

CC list modified. Removed persons who unlikely interested in ARM-specific
stuff. linux-arm-kernel and linux-mm added.

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kirill A. Shutemov on
On Wed, Jul 07, 2010 at 11:34:18PM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 07, 2010 at 11:56:01AM +0300, Kirill A. Shutemov wrote:
> > But it seems that the problem is more global. Potentially, any of
> > pmd_none() check may produce false results. I don't see an easy way to fix
> > it.
>
> It isn't. We normally guarantee that we always fill on both L1 entries.
> The only exception is for the mappings specified via create_mapping()
> which is used for the static platform mappings.

Why do not to change create_mapping() to follow the same rules?
I mean, create sections only if it asked for 2*SECTION_SIZE with
appropriate alignment. It reduces number of section mappings, but,
probably, will be a bit cleaner and less error-prune.

> > Does Linux VM still expect one PTE table per page?
>
> Yes, and as far as I can see probably always will. Hence why we need
> to put two L1 entries in one page and lie to the kernel about the sizes
> of the hardware entries.

Another option is leave half of page with PTE table free. Is it very bad
idea?

How other architectures handle it? Or only on ARM PTL table size is less
than page size?

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kirill A. Shutemov on
On Thu, Jul 08, 2010 at 02:31:22PM +0300, Kirill A. Shutemov wrote:
> On Wed, Jul 07, 2010 at 11:34:18PM +0100, Russell King - ARM Linux wrote:
> > On Wed, Jul 07, 2010 at 11:56:01AM +0300, Kirill A. Shutemov wrote:
> > > But it seems that the problem is more global. Potentially, any of
> > > pmd_none() check may produce false results. I don't see an easy way to fix
> > > it.
> >
> > It isn't. We normally guarantee that we always fill on both L1 entries.
> > The only exception is for the mappings specified via create_mapping()
> > which is used for the static platform mappings.
>
> Why do not to change create_mapping() to follow the same rules?
> I mean, create sections only if it asked for 2*SECTION_SIZE with
> appropriate alignment. It reduces number of section mappings, but,
> probably, will be a bit cleaner and less error-prune.
>
> > > Does Linux VM still expect one PTE table per page?
> >
> > Yes, and as far as I can see probably always will. Hence why we need
> > to put two L1 entries in one page and lie to the kernel about the sizes
> > of the hardware entries.
>
> Another option is leave half of page with PTE table free. Is it very bad
> idea?
>
> How other architectures handle it? Or only on ARM PTL table size is less
> than page size?

Russell, any comments? I would like to fix it in a right way.

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/