From: Mike Frysinger on
On Sat, May 8, 2010 at 18:32, Johannes Weiner wrote:
> On Fri, May 07, 2010 at 02:28:16PM -0400, Mike Frysinger wrote:
>> On Fri, May 7, 2010 at 06:15, Oskar Schirmer wrote:
>> > On Thu, May 06, 2010 at 14:46:04 -0400, Mike Frysinger wrote:
>> >> On Thu, May 6, 2010 at 06:37, Oskar Schirmer wrote:
>> >> >  struct ser_req {
>> >> > +       u16                     sample;
>> >> > +       char                    __padalign[L1_CACHE_BYTES - sizeof(u16)];
>> >> > +
>> >> >        u16                     reset;
>> >> >        u16                     ref_on;
>> >> >        u16                     command;
>> >> > -       u16                     sample;
>> >> >        struct spi_message      msg;
>> >> >        struct spi_transfer     xfer[6];
>> >> >  };
>> >>
>> >> are you sure this is necessary ?  ser_req is only ever used with
>> >> spi_sync() and it's allocated/released on the fly, so how could
>> >> anything be reading that memory between the start of the transmission
>> >> and the return to adi7877 ?
>> >
>> > msg is handed over to spi_sync, it contains the addresses
>> > which will be used to programme the DMA: the spi master
>> > transfer function will read these fields to start DMA.
>>
>> so the issue is coming from the SPI master drivers and not the AD7877 driver
>
> No, the issue is coming from ad7877 placing a transmission buffer
> into the same cache line with memory locations that are accessed outside
> the driver's scope.

you missed the point of my comment. as i clearly explained in the
other structure, the AD7877 driver was causing the cache desync. here
it is the SPI master that is implicitly causing it. i'm not talking
about the AD7877 being correct wrt to the implicit SPI/DMA
requirements, just what code exactly is triggering the cache issues.

>  /*
>   * DMA (thus cache coherency maintainance) requires the
>   * transfer buffers to live in their own cache lines.
>   */
>   char         __padalign[...];
>
> ?  It might be obvious what the code does, but I agree with
> Mike that it might not be immediately apparent why it's needed.

comment looks fine once the spelling is fixed (maintenance). thanks.
-mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Oskar Schirmer on
On Sun, May 09, 2010 at 00:45:41 -0400, Mike Frysinger wrote:
> On Sat, May 8, 2010 at 18:32, Johannes Weiner wrote:
> > On Fri, May 07, 2010 at 02:28:16PM -0400, Mike Frysinger wrote:
> >> On Fri, May 7, 2010 at 06:15, Oskar Schirmer wrote:
> >> > On Thu, May 06, 2010 at 14:46:04 -0400, Mike Frysinger wrote:
> >> >> On Thu, May 6, 2010 at 06:37, Oskar Schirmer wrote:
> >> >> >  struct ser_req {
> >> >> > +       u16                     sample;
> >> >> > +       char                    __padalign[L1_CACHE_BYTES - sizeof(u16)];
> >> >> > +
> >> >> >        u16                     reset;
> >> >> >        u16                     ref_on;
> >> >> >        u16                     command;
> >> >> > -       u16                     sample;
> >> >> >        struct spi_message      msg;
> >> >> >        struct spi_transfer     xfer[6];
> >> >> >  };
> >> >>
> >> >> are you sure this is necessary ?  ser_req is only ever used with
> >> >> spi_sync() and it's allocated/released on the fly, so how could
> >> >> anything be reading that memory between the start of the transmission
> >> >> and the return to adi7877 ?
> >> >
> >> > msg is handed over to spi_sync, it contains the addresses
> >> > which will be used to programme the DMA: the spi master
> >> > transfer function will read these fields to start DMA.
> >>
> >> so the issue is coming from the SPI master drivers and not the AD7877 driver
> >
> > No, the issue is coming from ad7877 placing a transmission buffer
> > into the same cache line with memory locations that are accessed outside
> > the driver's scope.
>
> you missed the point of my comment. as i clearly explained in the
> other structure, the AD7877 driver was causing the cache desync. here
> it is the SPI master that is implicitly causing it. i'm not talking
> about the AD7877 being correct wrt to the implicit SPI/DMA
> requirements, just what code exactly is triggering the cache issues.

In both cases ad7877 did place DMA buffers in the same
cache line with reference data needed by spi master to
programme the DMA engine. Once the machinery is started thru
spi_sync, the other case uses spi_async. Both cases open out
into master->transfer via spi_async. In both cases, with
drivers/spi/atmel_spi.c, cache lines are flushed and then
reference data is fed into the DMA engine, thereby
causing the line in question to be cached untimely.

Note, that atmel_spi (thus master) is not wrong here,
as it must assume DMA buffers being correctly aligned
into separate cache lines, so accessing reference data
after cache flush is not vicious. So in both cases the
problem is caused by ad7877 and thus fixed analoguously.

>
> >  /*
> >   * DMA (thus cache coherency maintainance) requires the
> >   * transfer buffers to live in their own cache lines.
> >   */
> >   char         __padalign[...];
> >
> > ?  It might be obvious what the code does, but I agree with
> > Mike that it might not be immediately apparent why it's needed.
>
> comment looks fine once the spelling is fixed (maintenance). thanks.

Ok, will prepare that soon.
Oskar
--
oskar schirmer, emlix gmbh, http://www.emlix.com
fon +49 551 30664-0, fax -11, bahnhofsallee 1b, 37081 göttingen, germany
sitz der gesellschaft: göttingen, amtsgericht göttingen hr b 3160
geschäftsführer: dr. uwe kracke, ust-idnr.: de 205 198 055

emlix - your embedded linux partner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Frysinger on
On Mon, May 10, 2010 at 06:42, Oskar Schirmer wrote:
> With dma based spi transmission, data corruption
> is observed occasionally. With dma buffers located
> right next to msg and xfer fields, cache lines
> correctly flushed in preparation for dma usage
> may be polluted again when writing to fields
> in the same cache line.
>
> Make sure cache fields used with dma do not
> share cache lines with fields changed during
> dma handling. As both fields are part of a
> struct that is allocated via kzalloc, thus
> cache aligned, moving the fields to the 1st
> position and insert padding for alignment
> does the job.

Acked-by: Mike Frysinger <vapier(a)gentoo.org>

i'm guessing Dmitry will pick it up now
-mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dmitry Torokhov on
On Mon, May 10, 2010 at 12:39:49PM -0400, Mike Frysinger wrote:
> On Mon, May 10, 2010 at 06:42, Oskar Schirmer wrote:
> > With dma based spi transmission, data corruption
> > is observed occasionally. With dma buffers located
> > right next to msg and xfer fields, cache lines
> > correctly flushed in preparation for dma usage
> > may be polluted again when writing to fields
> > in the same cache line.
> >
> > Make sure cache fields used with dma do not
> > share cache lines with fields changed during
> > dma handling. As both fields are part of a
> > struct that is allocated via kzalloc, thus
> > cache aligned, moving the fields to the 1st
> > position and insert padding for alignment
> > does the job.
>
> Acked-by: Mike Frysinger <vapier(a)gentoo.org>
>
> i'm guessing Dmitry will pick it up now

Yep.

--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mike Frysinger on
On Mon, May 10, 2010 at 17:22, Andrew Morton wrote:
> On Mon, 10 May 2010 12:42:34 +0200 "Oskar Schirmer" wrote:
>> With dma based spi transmission, data corruption
>> is observed occasionally. With dma buffers located
>> right next to msg and xfer fields, cache lines
>> correctly flushed in preparation for dma usage
>> may be polluted again when writing to fields
>> in the same cache line.
>>
>> Make sure cache fields used with dma do not
>> share cache lines with fields changed during
>> dma handling. As both fields are part of a
>> struct that is allocated via kzalloc, thus
>> cache aligned, moving the fields to the 1st
>> position and insert padding for alignment
>> does the job.
>
> This sounds odd.  Doesn't it imply that some code somewhere is missing
> some DMA synchronisation actions?

i think it's kind of dumb and induces this sort of bug
semi-frequently, but it is what the current DMA API requires (see like
Documentation/spi/spi-summary)
-mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/