From: Pedro Ribeiro on
On 8 April 2010 17:57, Alan Stern <stern(a)rowland.harvard.edu> wrote:
> On Thu, 8 Apr 2010, Daniel Mack wrote:
>
>> > > AFAIK, the driver shouldn't have to worry about this at all. When the
>> > > buffer gets DMA-mapped for the controller, the DMA mapping code should
>> > > see that the device has a 32-bit DMA mask and either bounce or IOMMU-map
>> > > the memory so that it appears below 4GB.
>> >
>> > That's true.  It would of course be more efficient for the buffer to be
>> > allocated below 4 GB, but it should work okay either way.  Daniel, do
>> > you have any idea why it fails?
>>
>> No, and I can't do real tests as I lack a 64bit machine. I'll do some
>> more investigation later today, but for now the only explanation I have
>> is that not the remapped DMA buffer is used eventually by the EHCI code
>> but the physical address of the original buffer.
>>
>> It would of course be best to fix the whole problem at this level, if
>> possible.
>
> It definitely needs to be fixed at this level.  But I still think it's
> appropriate to have new USB core functions for allocating and
> deallocating I/O memory.  The additional price is small compared to
> constantly bouncing the buffers.
>
> Pedro, in the hope of tracking down the problem, can you apply this
> patch and see what output it produces in the system log when the
> "interference" happens?  (Warning: It will produce quite a lot of
> output whenever you send data to the audio device -- between 500 and
> 1000 lines per second.)
>
> Alan Stern
>
>
>
> Index: 2.6.33/drivers/usb/core/hcd.c
> ===================================================================
> --- 2.6.33.orig/drivers/usb/core/hcd.c
> +++ 2.6.33/drivers/usb/core/hcd.c
> @@ -1395,6 +1395,10 @@ int usb_hcd_submit_urb (struct urb *urb,
>                usbmon_urb_submit_error(&hcd->self, urb, status);
>                goto error;
>        }
> +       if (usb_endpoint_is_isoc_out(&urb->ep->desc))
> +               dev_info(&urb->dev->dev, "Iso xfer %p dma %llx\n",
> +                               urb->transfer_buffer,
> +                               (unsigned long long) urb->transfer_dma);
>
>        if (is_root_hub(urb->dev))
>                status = rh_urb_enqueue(hcd, urb);
>
>

Hi Alan,

here is the output of the patch you sent me when the interference is triggered.

The log is long, 1.3mb in size.

Please let me know if you need anything more.

Regards,
Pedro
From: Robert Hancock on
On 04/07/2010 06:33 PM, Greg KH wrote:
> On Wed, Apr 07, 2010 at 03:13:11PM -0400, Alan Stern wrote:
>> On Wed, 7 Apr 2010, Takashi Iwai wrote:
>>
>>>> Ok, I'll write some dummies for usb_malloc() and usb_zalloc() which
>>>> will just call kmalloc() with GFP_DMA32 for now.
>>>
>>> Can't we provide only zalloc() variant? Zero'ing doesn't cost much,
>>> and the buffer allocation shouldn't be called too often.
>>
>> Linus specifically requested us to avoid using kzalloc in usbfs. I
>> can't find the message in the email archives, but Greg KH should be
>> able to confirm it.
>>
>> As long as we're imitating kmalloc for one use, we might as well make
>> it available to all.
>>
>>>> And while at it,
>>>> usb_alloc_buffer() will be renamed to usb_alloc_consistent().
>>>
>>> Most of recent functions are named with "coherent".
>>
>> Yes, the terminology got a little confused between the PCI and DMA
>> realms. I agree, "coherent" is better.
>>
>> BTW, although some EHCI controllers may support 64-bit DMA, the driver
>> contains this:
>>
>> if (HCC_64BIT_ADDR(hcc_params)) {
>> ehci_writel(ehci, 0,&ehci->regs->segment);
>> #if 0
>> // this is deeply broken on almost all architectures
>> if (!dma_set_mask(hcd->self.controller, DMA_BIT_MASK(64)))
>> ehci_info(ehci, "enabled 64bit DMA\n");
>> #endif
>> }
>>
>> I don't know if the comment is still true, but until the "#if 0" is
>> removed, ehci-hcd won't make use of 64-bit DMA.
>
> I think someone tried to remove it recently, but I wouldn't let them :)
>
> What a mess, hopefully xhci will just take over and save the world from
> this whole thing...

True.. except for the fact that the xhci driver currently doesn't do
64-bit DMA either, nor does it support MSI even though the HW supports
it (surprisingly enough the NEC Windows driver does, MSI-X even). At
this point only Intel likely knows how to do this properly, though,
since AFAICS the spec isn't publicly available yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oliver Neukum on
Am Freitag, 9. April 2010 00:20:36 schrieb Alan Stern:
> > > That would work, but it doesn't match the way existing drivers use the
> > > interface. For example, the audio driver allocates a 16-byte coherent
> > > buffer and then uses four bytes from it for each of four different
> > > URBs.
> >
> > That will not work with any fallback that does not yield a coherent buffer.
>
> What you mean isn't entirely clear. But it certainly does work in
> various circumstances that don't yield coherent buffers. For example,
> it works if the controller uses PIO instead of DMA. It also works if
> the controller uses DMA and the URBs have to be bounced.

It'll work on x86. On incoherent architectures this violates the cacheline
rules for DMA-mapping if you have to bounce. So it seems to me that
if you want to share a buffer between URBs, it must be coherent.

Regards
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Fri, 9 Apr 2010, Oliver Neukum wrote:

> Am Freitag, 9. April 2010 00:20:36 schrieb Alan Stern:
> > > > That would work, but it doesn't match the way existing drivers use the
> > > > interface. For example, the audio driver allocates a 16-byte coherent
> > > > buffer and then uses four bytes from it for each of four different
> > > > URBs.
> > >
> > > That will not work with any fallback that does not yield a coherent buffer.
> >
> > What you mean isn't entirely clear. But it certainly does work in
> > various circumstances that don't yield coherent buffers. For example,
> > it works if the controller uses PIO instead of DMA. It also works if
> > the controller uses DMA and the URBs have to be bounced.
>
> It'll work on x86. On incoherent architectures this violates the cacheline
> rules for DMA-mapping if you have to bounce.

Not true. Consider: The driver allocates a 16-byte buffer (xbuf)
divided up into four sets of four bytes, and sets

urb[i].transfer_buffer_dma = xbuf_dma + 4*i;

Then usb_submit_urb(urb[i]) will copy the appropriate four bytes to a
bounce buffer and map the bounce buffer. Accesses to the other parts
of xbuf won't violate the cacheline rules, because xbuf isn't mapped
for DMA -- only the bounce buffer is. When urb[i] completes, the
bounce buffer contents will be copied back to the original four bytes
in xbuf. Again, there is no violation of cacheline rules.

> So it seems to me that
> if you want to share a buffer between URBs, it must be coherent.

No. But it must be allocated via usb_alloc_buffer() (or whatever that
routine gets renamed to).

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oliver Neukum on
Am Freitag, 9. April 2010 16:41:48 schrieb Alan Stern:
> > It'll work on x86. On incoherent architectures this violates the cacheline
> > rules for DMA-mapping if you have to bounce.
>
> Not true. Consider: The driver allocates a 16-byte buffer (xbuf)
> divided up into four sets of four bytes, and sets
>
> urb[i].transfer_buffer_dma = xbuf_dma + 4*i;
>
> Then usb_submit_urb(urb[i]) will copy the appropriate four bytes to a
> bounce buffer and map the bounce buffer. Accesses to the other parts
> of xbuf won't violate the cacheline rules, because xbuf isn't mapped
> for DMA -- only the bounce buffer is. When urb[i] completes, the
> bounce buffer contents will be copied back to the original four bytes
> in xbuf. Again, there is no violation of cacheline rules.

I think you are assuming that either every or no part of the buffer is mapped
for DMA in place. I don't think you can assume that.

Regards
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/