From: James Bottomley on
On Thu, 2010-02-25 at 08:12 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-24 at 08:16 +0100, Oliver Neukum wrote:
> > I don't know. The issue seems quite complex. It would seem better to
> > centralize it as far as practical. Do you have a wrapper drivers could
> > call?
>
> flush_dcache_page() ? :-)

Actually, that can be wrong depending on the implementation. The
problem is incoherency of the kernel page (dirty) with respect to user
space aliases (clean). What has to happen on parisc is that the kernel
alias needs flushing. We can guarantee the userspace aliases to be
clean (and not moved in). We wouldn't want to incur the expense of
flushing the user space pages as well.

> Now, the subsystem might be the one to know whether something is mapped
> into userspace or not (v4l in our case) in which case a wrapper could be
> created.

Right, so it's the responsibility of the API used by the subsystem.
Thus Caitlin's pio_kmap seems the right one ... I don't understand what
the additional problems are.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Wed, 2010-02-24 at 21:13 +0000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-02-24 at 11:19 -0500, Alan Stern wrote:
> > > It is but I'm not confident the responsibility for doing that cleanup
> > > is at the HCD level. That would impact a lot of HCD activities that
> > > don't need such flushing since the use of the page is purely in-kernel.
> >
> > That's right. The HCD merely puts data wherever it's told to. It
> > doesn't know whether the destination is in the page cache, in
> > userspace, or anywhere else. The same is true for usb-storage.
>
> I'm surprised that usb-storage has an issue here. It shouldn't afaik,
> since it's just a SCSI driver (or not anymore ?) and the BIO or
> filesystems handle things there no ? I haven't seen a single call to
> flush_dcache_page() in any of drivers/scsi, drivers/ata or drivers/ide
> when I looked...

The BIO or filesystem code don't call flush_dcache_page() either (well
some do like cramfs or jffs but they decompress the data received from
the block device).

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Wed, 2010-02-24 at 02:47 +0000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-19 at 17:36 +0000, Catalin Marinas wrote:
> >
> > If a page is already mapped in user space, flush_dcache_page() on ARM
> > does the flushing rather than deferring it to update_mmu_cache().
>
> This is for D-cache aliases on VIVT right ? Or are you still talking
> about I/D coherency on PIPT ARMs ? Because the later should not matter
> for already mapped userspace pages in the sense that if user space
> explicitely read() onto a page, it's up to userspace to cache clean that
> page before executing from it in my book :-)

I was still thinking about PIPT I/D coherency. The read() case you
mention is pretty clear, no need or the kernel to ensure coherency
(especially since writing is done via copy_to_user rather than to the
page cache page).

For mmap'ed pages (and present in the page cache), is it guaranteed that
the HCD driver won't write to it once it has been mapped into user
space? If that's the case, it may solve the problem by just reversing
the meaning of PG_arch_1 on ARM and assume that a newly allocated page
has dirty D-cache by default.

> > The PIO HCD drivers, however, don't call flush_dcache_page(). Is it possible
> > that the HCD could transfer data into a page cache page already mapped
> > in user space? My understanding is that the scenario above is possible.
>
> It is but I'm not confident the responsibility for doing that cleanup
> is at the HCD level. That would impact a lot of HCD activities that
> don't need such flushing since the use of the page is purely in-kernel.
>
> Though I suppose that could be optimized out in most case using the page
> use count.
>
> But I still wonder whether it should be pushed down to the actual
> interface drivers, that's always been the case I believe. In fact, in
> the case of block ops, it's generally done at the BIO or even file
> system layer right ?

The filesystem layer does it only if it needs to touch the data written
by the block device (e.g. cramfs, jffs). Some block devices call
flush_dcache_page (like mmci.c) while some others don't (and those that
use DMA actually don't since the DMA API handles the flushing).

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Catalin Marinas on
On Wed, 2010-02-24 at 02:39 +0000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-02-19 at 17:15 +0000, Catalin Marinas wrote:
> > > We assume that anybody that dirties a page in the kernel will call
> > > flush_dcache_page() which removes our PG_arch_1 bit thus marking the
> > > page "dirty".
> >
> > This assumption is not valid with some drivers like USB HCD doing PIO.
> > But, yes, that's how it should be done.
>
> So we go back to the fix should be done at the individual drivers level.
> If it's going to write into the page cache, it needs to whack the bits.
>
> Now there's of course the question as to whether you really only want to
> do that for a PIO access and not for a DMA access, I think on power, we
> don't really discriminate that much (since in any case our icache still
> needs flushing). Maybe it would be useful to separate the I$ and D$ bits
> but I'm not sure I can be bothered.

On ARM, update_mmu_cache() invalidates the I-cache (if VM_EXEC)
independent of whether the D-cache was dirty (since we can get
speculative fetches into the I-cache before it was even mapped).

> > > Note that from experience, doing the check & flushes in
> > > update_mmu_cache() is racy on SMP. At least for I$/D$, we have the case
> > > where processor one does set_pte followed by update_mmu_cache(). The
> > > later isn't done yet but processor 2 sees the PTE now and starts using
> > > it, cache hasn't been fully flushed yet. You may avoid that race in some
> > > ways, but on ppc, I've stopped using that.
> >
> > I think that's possible on ARM too. Having two threads on different
> > CPUs, one thread triggers a prefetch abort (instruction page fault) on
> > CPU0 but the second thread on CPU1 may branch into this page after
> > set_pte() (hence not fault) but before update_mmu_cache() doing the
> > flush.
> >
> > On ARM11MPCore we flush the caches in flush_dcache_page() because the
> > cache maintenance operations weren't visible to the other CPUs.
>
> I'm not even sure that's going to be 100% correct. Don't you also need
> to flush the remote icaches when you are dealing with instructions (such
> as swap) anyways ?

I don't think we tried swap but for pages that have been mapped for the
first time, the I-cache would be clean. At mm switching, if a thread
migrates to a new CPU we invalidate the cache at that point.

> I've had some discussions in the past with Russell and others around the
> problem of non-broadcast cache ops on ARM SMP since that's also hurting
> you hard with dma mappings.
>
> Can you issue IPIs as FIQs if needed (from my old ARM knowledge, FIQs
> are still on even in local_irq_save() blocks right ? I haven't touched
> low level ARM for years tho, I may have forgotten things).

I have a patch for using IPIs via IRQ from the DMA API functions but,
while it works, it can deadlock with some drivers (complex situation).
Note that the patch added a specific IPI implementation which can cope
with interrupts being disabled (unlike the generic one).

My latest solution - http://bit.ly/apJv3O - is to use dummy
read-for-ownership or write-for-ownership accesses in the DMA cache
flushing functions to force cache line migration from the other CPUs.
Our current benchmarks only show around 10% disc throughput penalty
compared to the normal SMP case (compared to the UP case the penalty is
bigger but that's due to other things).

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Fri, 26 Feb 2010, Catalin Marinas wrote:

> For mmap'ed pages (and present in the page cache), is it guaranteed that
> the HCD driver won't write to it once it has been mapped into user
> space? If that's the case, it may solve the problem by just reversing
> the meaning of PG_arch_1 on ARM and assume that a newly allocated page
> has dirty D-cache by default.

Nothing is guaranteed. The HCD will write to wherever it is asked. If
a driver does input to an mmap'ed page, the HCD won't even know that
the page is mmap'ed.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/