From: KAMEZAWA Hiroyuki on
On Mon, 14 Jun 2010 09:18:23 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> On Fri, 11 Jun 2010 10:43:27 -0700
> "H. Peter Anvin" <hpa(a)zytor.com> wrote:
>
> > On 06/11/2010 02:20 AM, Kenji Kaneshige wrote:
> > > If the physical address is too high to be handled by ioremap() in
> > > x86_32 PAE (e.g. more than 36-bit physical address), ioremap() must
> > > return error (NULL). However, current x86 ioremap try to map this too
> > > high physical address, and it causes unexpected behavior.
> >
> > What unexpected behavior? It is perfectly legitimately to map such a
> > high address in PAE mode. We have a 36-bit kernel-imposed limit on
> > *RAM* in 32-bit mode (because we can't manage more than that), but there
> > is no reason it should apply to I/O.
> >
>
> I'm sorry for lack of study.

Now, __ioremap_caller() gets 64 bit argument as physical address.
== 2.6.35-rc2 ==
static void __iomem *__ioremap_caller(resource_size_t phys_addr,
unsigned long size, unsigned long prot_val, void *caller)
{
==

And check physical address is valid based on the value got from cpuid.
==
if (!phys_addr_valid(phys_addr)) {
printk(KERN_WARNING "ioremap: invalid physical address %llx\n",
(unsigned long long)phys_addr);
WARN_ON_ONCE(1);
return NULL;
}
==


At 1st, this seems buggy.
==
/*
* Don't allow anybody to remap normal RAM that we're using..
*/
for (pfn = phys_addr >> PAGE_SHIFT;
(pfn << PAGE_SHIFT) < (last_addr & PAGE_MASK);
pfn++) {

int is_ram = page_is_ram(pfn);

if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
return NULL;
WARN_ON_ONCE(is_ram);
}
==
If phys_addr > 44bit, pfn should be 64bit. don't it ?
Then, page_is_ram should eat 64bit arguments.

But here, assume phys_addr < 44bit and continue.

Next,
===
offset = phys_addr & ~PAGE_MASK;
phys_addr &= PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;

this mask ops is bad. as the patch-1 shows.

And, finally, calls get_vm_area.

==
/*
* Ok, go for it..
*/
area = get_vm_area_caller(size, VM_IOREMAP, caller);
if (!area)
goto err_free_memtype;
area->phys_addr = phys_addr;
vaddr = (unsigned long) area->addr;
==
Because area->phys_addr is 32bit. This may break something if we unmap this later.

And, page table operation is this.
==
if (ioremap_page_range(vaddr, vaddr + size, phys_addr, prot))
goto err_free_area;
==

Let's see lib/ioremap.c
==
int ioremap_page_range(unsigned long addr,
unsigned long end, unsigned long phys_addr, pgprot_t prot)
{
pgd_t *pgd;

==

Oh, phys_addr is unsigned long again....Then, Kenji-san, what's buggy is this check.
I think.

Thanks,
-Kame





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kenji Kaneshige on
(2010/06/14 17:23), Kenji Kaneshige wrote:
> (2010/06/14 15:38), Maciej W. Rozycki wrote:
>> On Mon, 14 Jun 2010, Kenji Kaneshige wrote:
>>
>>> - Architectural limit of physical address in x86 32-bit mode is 40-bit
>>> (depnds on processor version).
>>
>> According to documentation I happen to have handy this limit is actually
>> 52 bits (and space is currently available in the data structures used for
>> a possible future extension up to 63 bits).
>
> Thank you for pointing it out. I misunderstood that.
>
> Now I think I need to add additional check to see if specified
> physical address can be handled by x86 ioremap(), instead of
> changing phys_addr_valid(). The code would be
>
> static void __iomem *__ioremap_caller(...)
> {
> ...
> #if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
> if (phys_addr is higer than 36-bit) {
> printk(KERN_INFO "ioremap can't map physical address %llx\n",
> return NULL;
> }
> #endif
> ...
> }

Please ignore above again. Sorry for inconvenient.
According to the comment from H. Peter Anvin, 36-bit limit is on
RAM in 32-bit mode. So this approach is wrong.

Now I guess there is a bug that doesn't handle physical address
higher than 32-bit properly somewhere...

Thanks,
Kenji Kaneshige

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kenji Kaneshige on
Thank you Hiroyuki.

So many bugs in ioremap()...

Will try with those bugs fixed.

Thanks,
Kenji Kaneshige


(2010/06/14 17:59), KAMEZAWA Hiroyuki wrote:
> On Mon, 14 Jun 2010 09:18:23 +0900
> KAMEZAWA Hiroyuki<kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
>> On Fri, 11 Jun 2010 10:43:27 -0700
>> "H. Peter Anvin"<hpa(a)zytor.com> wrote:
>>
>>> On 06/11/2010 02:20 AM, Kenji Kaneshige wrote:
>>>> If the physical address is too high to be handled by ioremap() in
>>>> x86_32 PAE (e.g. more than 36-bit physical address), ioremap() must
>>>> return error (NULL). However, current x86 ioremap try to map this too
>>>> high physical address, and it causes unexpected behavior.
>>>
>>> What unexpected behavior? It is perfectly legitimately to map such a
>>> high address in PAE mode. We have a 36-bit kernel-imposed limit on
>>> *RAM* in 32-bit mode (because we can't manage more than that), but there
>>> is no reason it should apply to I/O.
>>>
>>
>> I'm sorry for lack of study.
>
> Now, __ioremap_caller() gets 64 bit argument as physical address.
> == 2.6.35-rc2 ==
> static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> unsigned long size, unsigned long prot_val, void *caller)
> {
> ==
>
> And check physical address is valid based on the value got from cpuid.
> ==
> if (!phys_addr_valid(phys_addr)) {
> printk(KERN_WARNING "ioremap: invalid physical address %llx\n",
> (unsigned long long)phys_addr);
> WARN_ON_ONCE(1);
> return NULL;
> }
> ==
>
>
> At 1st, this seems buggy.
> ==
> /*
> * Don't allow anybody to remap normal RAM that we're using..
> */
> for (pfn = phys_addr>> PAGE_SHIFT;
> (pfn<< PAGE_SHIFT)< (last_addr& PAGE_MASK);
> pfn++) {
>
> int is_ram = page_is_ram(pfn);
>
> if (is_ram&& pfn_valid(pfn)&& !PageReserved(pfn_to_page(pfn)))
> return NULL;
> WARN_ON_ONCE(is_ram);
> }
> ==
> If phys_addr> 44bit, pfn should be 64bit. don't it ?
> Then, page_is_ram should eat 64bit arguments.
>
> But here, assume phys_addr< 44bit and continue.
>
> Next,
> ===
> offset = phys_addr& ~PAGE_MASK;
> phys_addr&= PAGE_MASK;
> size = PAGE_ALIGN(last_addr+1) - phys_addr;
>
> this mask ops is bad. as the patch-1 shows.
>
> And, finally, calls get_vm_area.
>
> ==
> /*
> * Ok, go for it..
> */
> area = get_vm_area_caller(size, VM_IOREMAP, caller);
> if (!area)
> goto err_free_memtype;
> area->phys_addr = phys_addr;
> vaddr = (unsigned long) area->addr;
> ==
> Because area->phys_addr is 32bit. This may break something if we unmap this later.
>
> And, page table operation is this.
> ==
> if (ioremap_page_range(vaddr, vaddr + size, phys_addr, prot))
> goto err_free_area;
> ==
>
> Let's see lib/ioremap.c
> ==
> int ioremap_page_range(unsigned long addr,
> unsigned long end, unsigned long phys_addr, pgprot_t prot)
> {
> pgd_t *pgd;
>
> ==
>
> Oh, phys_addr is unsigned long again....Then, Kenji-san, what's buggy is this check.
> I think.
>
> Thanks,
> -Kame
>
>
>
>
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kenji Kaneshige on
(2010/06/14 18:13), Kenji Kaneshige wrote:
> Thank you Hiroyuki.
>
> So many bugs in ioremap()...
>
> Will try with those bugs fixed.
>
> Thanks,
> Kenji Kaneshige

The problem seems to be fixed by the following patch. This is still
under testing. I will post the patch as v2 after testing.

Thanks,
Kenji Kaneshige


Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji(a)jp.fujitsu.com>

---
arch/x86/mm/ioremap.c | 11 +++++------
include/linux/io.h | 4 ++--
include/linux/vmalloc.h | 2 +-
lib/ioremap.c | 10 +++++-----
4 files changed, 13 insertions(+), 14 deletions(-)

Index: linux-2.6.34/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6.34.orig/arch/x86/mm/ioremap.c
+++ linux-2.6.34/arch/x86/mm/ioremap.c
@@ -62,7 +62,8 @@ int ioremap_change_attr(unsigned long va
static void __iomem *__ioremap_caller(resource_size_t phys_addr,
unsigned long size, unsigned long prot_val, void *caller)
{
- unsigned long pfn, offset, vaddr;
+ u64 pfn, last_pfn;
+ unsigned long offset, vaddr;
resource_size_t last_addr;
const resource_size_t unaligned_phys_addr = phys_addr;
const unsigned long unaligned_size = size;
@@ -100,10 +101,8 @@ static void __iomem *__ioremap_caller(re
/*
* Don't allow anybody to remap normal RAM that we're using..
*/
- for (pfn = phys_addr >> PAGE_SHIFT;
- (pfn << PAGE_SHIFT) < (last_addr & PAGE_MASK);
- pfn++) {
-
+ last_pfn = last_addr >> PAGE_SHIFT;
+ for (pfn = phys_addr >> PAGE_SHIFT; pfn < last_pfn; pfn++) {
int is_ram = page_is_ram(pfn);

if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
@@ -115,7 +114,7 @@ static void __iomem *__ioremap_caller(re
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PAGE_MASK;
+ phys_addr &= PHYSICAL_PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;

retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
Index: linux-2.6.34/include/linux/vmalloc.h
===================================================================
--- linux-2.6.34.orig/include/linux/vmalloc.h
+++ linux-2.6.34/include/linux/vmalloc.h
@@ -30,7 +30,7 @@ struct vm_struct {
unsigned long flags;
struct page **pages;
unsigned int nr_pages;
- unsigned long phys_addr;
+ phys_addr_t phys_addr;
void *caller;
};

Index: linux-2.6.34/lib/ioremap.c
===================================================================
--- linux-2.6.34.orig/lib/ioremap.c
+++ linux-2.6.34/lib/ioremap.c
@@ -13,10 +13,10 @@
#include <asm/pgtable.h>

static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, unsigned long phys_addr, pgprot_t prot)
+ unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
{
pte_t *pte;
- unsigned long pfn;
+ u64 pfn;

pfn = phys_addr >> PAGE_SHIFT;
pte = pte_alloc_kernel(pmd, addr);
@@ -31,7 +31,7 @@ static int ioremap_pte_range(pmd_t *pmd,
}

static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
- unsigned long end, unsigned long phys_addr, pgprot_t prot)
+ unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
{
pmd_t *pmd;
unsigned long next;
@@ -49,7 +49,7 @@ static inline int ioremap_pmd_range(pud_
}

static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
- unsigned long end, unsigned long phys_addr, pgprot_t prot)
+ unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
{
pud_t *pud;
unsigned long next;
@@ -67,7 +67,7 @@ static inline int ioremap_pud_range(pgd_
}

int ioremap_page_range(unsigned long addr,
- unsigned long end, unsigned long phys_addr, pgprot_t prot)
+ unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
{
pgd_t *pgd;
unsigned long start;
Index: linux-2.6.34/include/linux/io.h
===================================================================
--- linux-2.6.34.orig/include/linux/io.h
+++ linux-2.6.34/include/linux/io.h
@@ -29,10 +29,10 @@ void __iowrite64_copy(void __iomem *to,

#ifdef CONFIG_MMU
int ioremap_page_range(unsigned long addr, unsigned long end,
- unsigned long phys_addr, pgprot_t prot);
+ phys_addr_t phys_addr, pgprot_t prot);
#else
static inline int ioremap_page_range(unsigned long addr, unsigned long end,
- unsigned long phys_addr, pgprot_t prot)
+ phys_addr_t phys_addr, pgprot_t prot)
{
return 0;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 06/14/2010 01:27 AM, Kenji Kaneshige wrote:
>
> Please ignore above. I misunderstood the architectural limit of physical
> address in x86 32-bit mode. It is not 40-bits, but 52-bits.
>

Well, it can be anywhere from 32 to 52 bits, depending on the hardware.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/