arm: msm: Add MSM IOMMU support. [Kernel]

Prev: [PATCH] tracing: extend recordmcount to better support Blackfin mcount
Next: typo in crypto kconfig ?

From: Stepan Moskovchenko on 9 Aug 2010 01:20

Russell,

Tomorrow I will fix the style / iomem / other issues tomorrow and clean up
the failure paths. I have some questions for you, inline.

>> + iommu_drvdata = dev_get_drvdata(ctx_drvdata->pdev->dev.parent);
>
> Please explain what's going on here with parent devices. What device is
> pdev referring to, and what is pdev->dev.parent ? It seems quite dodgy
> to just expect a certain device relationship without checking that the
> device is what is expected (eg, it's non-NULL, it's of the right bus type,
> and it appears to be suitably named) before dereferencing its driver data.

I could see why this seems confusing. pdev->dev.parent is the IOMMU
context's parent device. There is a device hierarchy here. Each IOMMU
device has multiple IOMMU *context* devices as children. These are
physical, actual translation contexts, with their own distinct sets of
registers. The parent device (the IOMMU) also has a number of "global"
registers that are shared among the translation contexts. It is
impractical to lump the global register space and the context bank
registers into one device because the translation contexts are effectively
their own IOMMUs that can be separately managed. Additionally, the number
of translation contexts varies among IOMMUs in the system. So, in this
particular case, the driver needs to know the ioremapped address of the
IOMMU, so it references the driverdata of the parent device (the IOMMU
itself) to get it. There will never be a context "by itself", ie, a
context's parent will always be an IOMMU device, so the operation of
referencing the parent's data will always be safe. But I can put in some
sanity checks for the pointers if you wish.

>> +static void msm_iommu_domain_destroy(struct iommu_domain *domain)
>> +{
>> + struct msm_priv *priv = (struct msm_priv *) domain->priv;
>
> struct msm_priv *priv = domain->priv;
>
> Should this be outside msm_iommu_lock?

domain->priv should always be unchanged if the domain is still "valid".
The contents of domain->priv may change, but domain->priv does not get
reassigned (until being set to null right as domain is being freed), so I
put this outside the spinlock. Arguably, you could have a problem if a
function is trying to use the domain *as it's being freed*, but then
bigger problems will arise. I will make the change, though.

>> + spin_unlock_irqrestore(&msm_iommu_lock, flags);
>> + return;
>> +fail_inval:
>> + spin_unlock_irqrestore(&msm_iommu_lock, flags);
>> + return;
>
> Does this need to be repeated?

I had initially operated under the assumption that unmap() could return a
failure code. I will clean this up.

>> +#ifndef CONFIG_IOMMU_PGTABLES_L2
>
> I think you mean:
> #ifndef CONFIG_IOMMU_PGTABLES_L1
>
> because as I've said to you several times now, flush_cache_all() is about
> flushing data out of the L1 cache only. It does NOT affect L2 cache.

I do mean to flush the L2 cache here (ie, flush the data from whatever
cache it may be in, all the way to RAM). I believed that
v7_flush_cache_all() (and hence, flush_cache_all() as was suggested by you
as a replacement) would flush the L1 and L2. The comment in cache-v7.S
suggests that the function will "Flush the entire cache system." (line 81)
which sounds like L2 ought to be included (and the observed behavior seems
to agree). I just need the pagetable in RAM to reflect the latest changes
made by the driver, so that the IOMMU can get to it if it hasn't been
configured to access the L2 cache on its own. Could you please suggest a
correct way to flush the L2 cache?

>> +static int msm_iommu_unmap(struct iommu_domain *domain, unsigned long
>> va,
>> + int gfp_order)
>
> What does 'gfp_order' have to do with get_free_pages in this function?
> Wouldn't just 'order' be more appropriate?

I called it 'gfp_order' because that is how linux/iommu.h names that
parameter in the IOMMU API. I guess 'order' *is* more appropriate... I'll
fix it.

>> + /* Upper 20 bits from PAR, lower 12 from VA */
>> + spin_unlock_irqrestore(&msm_iommu_lock, flags);
>> + return (par & 0xFFFFF000) | (va & 0x00000FFF);
>> +fail_nodev:
>> + spin_unlock_irqrestore(&msm_iommu_lock, flags);
>> + return -ENODEV;
>
> Hmm, returning -ve numbers and addresses... how do you tell the difference
> between the two in the calling function?

That is a good question. I am not sure how to handle the error conditions
in this case. My first idea was to just return 0 for all iova-to-phys
faults, but 0 too is a legitimate address (although kind of a dodgy one).
But it may be better to return 0 for the error case. Also, the translation
hardware will not generate a fault interrupt if it's doing a translation
at the request of the CPU (as opposed to when it's being addressed by a
client) but I can manually ask the IOMMU to generate a translation fault
interrupt if iova_to_phys results in a fault (at least we'll know
something is wrong). I am not sure which approach is the best, but am
leaning towards just returning 0 on errors. What do you think is best?

> I'm also sure you can do better with these exit paths in a similar way to
> what I've illustrated above.

Yup, I'll fix it. I will try to post a revised patch tomorrow, after I get
some sleep in me (I guess it will be the evening where you are).

Thanks
Steve

Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joe Perches on 9 Aug 2010 23:10

On Mon, 2010-08-09 at 19:27 -0700, Stepan Moskovchenko wrote:
> Add support for the IOMMUs found on the upcoming Qualcomm
> MSM8x60 chips. These IOMMUs allow virtualization of the
> address space used by most of the multimedia cores on these
> chips. The driver implements the kernel's IOMMU API.
> diff --git a/arch/arm/mach-msm/iommu.c b/arch/arm/mach-msm/iommu.c
> new file mode 100644
> index 0000000..b22aecd
> --- /dev/null
> +++ b/arch/arm/mach-msm/iommu.c
> @@ -0,0 +1,601 @@

Why use both pr_err and printk(KERN_ERR ?

> +#define pr_fmt(fmt) "%s %i " fmt, __func__, __LINE__
> + pr_err("bad size: %d\n", len);
> + pr_err("null page table\n");
> + pr_err("could not allocate second level table\n");
> + pr_err("null page table\n");
> + pr_err("first level PTE is 0\n");
> + printk(KERN_ERR "FAR = %08x PAR = %08x\n",
> + printk(KERN_ERR "FSR = %08x [%s%s%s%s%s%s%s%s%s%s]\n", fsr,
> + printk(KERN_ERR "FSYNR0 = %08x FSYNR1 = %08x\n",
> + printk(KERN_ERR "TTBR0 = %08x TTBR1 = %08x\n",
> + printk(KERN_ERR "SCTLR = %08x ACTLR = %08x\n",
> + printk(KERN_ERR "PRRR = %08x NMRR = %08x\n",
> + pr_err("Invalid device ID in context interrupt handler\n");
> + printk(KERN_ERR "===== WOAH! =====\n");
> + printk(KERN_ERR "Unexpected IOMMU page fault!\n");
> + printk(KERN_ERR "base = %08x\n", (unsigned int) base);
> + printk(KERN_ERR "Fault occurred in context %d.\n", i);
> + printk(KERN_ERR "Interesting registers:\n");
> + printk(KERN_ERR "\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: stepanm on 9 Aug 2010 23:50

> Why use both pr_err and printk(KERN_ERR ?

I had been using pr_err for printing badness messages internal to the
driver, but I guess their functionality is largely paralleled by return
error codes. I had been using printk(KERN_ERR for messages that have been
caused by an event that is more external to the driver and its callers
(like the data for the page fault, which is most likely to be caused by a
misprogrammed client). pr_err gives other information like function name,
line number, etc, which would be noise when looking at an iommu page
fault. And, I guess if the pr_err stuff got turned off, you would want the
page fault data to still be printed... but that's kind of a minor point.

I am actually considering doing away with all the pr_err lines entirely
because they typically also result in returning -Ewhatever and other IOMMU
drivers don't give this level of error reporting. But, they may be helpful
to the users of the API. What are people's thoughts on this?

Thanks
Steve

>> +#define pr_fmt(fmt) "%s %i " fmt, __func__, __LINE__
>> + pr_err("bad size: %d\n", len);
>> + pr_err("null page table\n");
>> + pr_err("could not allocate second level table\n");
>> + pr_err("null page table\n");
>> + pr_err("first level PTE is 0\n");
>> + printk(KERN_ERR "FAR = %08x PAR = %08x\n",
>> + printk(KERN_ERR "FSR = %08x [%s%s%s%s%s%s%s%s%s%s]\n", fsr,
>> + printk(KERN_ERR "FSYNR0 = %08x FSYNR1 = %08x\n",
>> + printk(KERN_ERR "TTBR0 = %08x TTBR1 = %08x\n",
>> + printk(KERN_ERR "SCTLR = %08x ACTLR = %08x\n",
>> + printk(KERN_ERR "PRRR = %08x NMRR = %08x\n",
>> + pr_err("Invalid device ID in context interrupt handler\n");
>> + printk(KERN_ERR "===== WOAH! =====\n");
>> + printk(KERN_ERR "Unexpected IOMMU page fault!\n");
>> + printk(KERN_ERR "base = %08x\n", (unsigned int) base);
>> + printk(KERN_ERR "Fault occurred in context %d.\n", i);
>> + printk(KERN_ERR "Interesting registers:\n");
>> + printk(KERN_ERR "\n");
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
> in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Joe Perches on 10 Aug 2010 01:40

On Mon, 2010-08-09 at 20:42 -0700, stepanm(a)codeaurora.org wrote:
> I am actually considering doing away with all the pr_err lines entirely
> because they typically also result in returning -Ewhatever and other IOMMU
> drivers don't give this level of error reporting. But, they may be helpful
> to the users of the API. What are people's thoughts on this?

I believe the current uses of pr_err in this module
are unnecessary or could be converted to pr_debug.

I think __func__/line aren't particularly useful.

I prefer log messages prefixed with the module name.

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

and I'd convert all the current pr_err to pr_debug
and convert the printk(KERN_ERR to pr_err(

cheers, Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: stepanm on 10 Aug 2010 15:30

> On Mon, 2010-08-09 at 20:42 -0700, stepanm(a)codeaurora.org wrote:
> I believe the current uses of pr_err in this module
> are unnecessary or could be converted to pr_debug.
>
> I think __func__/line aren't particularly useful.
>
> I prefer log messages prefixed with the module name.
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> and I'd convert all the current pr_err to pr_debug
> and convert the printk(KERN_ERR to pr_err(
>
> cheers, Joe

Thanks. Fixed in v4, which will go up in a few minutes.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3
Prev: [PATCH] tracing: extend recordmcount to better support Blackfin mcount
Next: typo in crypto kconfig ?